SlideShare a Scribd company logo
1 of 70
DataFinder:  Organizing the Data Chaos of Scientists PyCon UK 2008 (September 12 th , 2008, Birmingham) Andreas Schreiber < Andreas.Schreiber@dlr.de> German Aerospace Center (DLR), Cologne http://www.dlr.de/sc
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Sites and employees    Köln    Lampoldshausen    Stuttgart    Oberpfaffenhofen Braunschweig       Göttingen Berlin -      Bonn Trauen      Hamburg    Neustrelitz Weilheim    Bremen -  
Short Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object]
Introduction Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction Data Management Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder History Search for solution for scientific data management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Development From Java Prototype to Python Product… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Python for Scientists and Engineers Reasons for Python in Research and Industry ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ I want to design  planes,  not software!”
“ Python has the cleanest,  most-scientist- or engineer  friendly syntax and semantics. Paul F. Dubois Paul F. Dubois. Ten good practices in scientific programming. Comp. In Sci. Eng., Jan/Feb 1999, pp.7-11
DataFinder Overview Basic Concept ,[object Object],[object Object],[object Object]
WebDAV Web-based Distributed Authoring & Versioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Overview Client and Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Client Graphical User Interfaces User Client Administrator Client Implementation in Python with Qt/PyQt
DataFinder Server Supported WebDAV servers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV / Meta Data Server (1) Tamino WebDAV Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV / Meta Data Server (2) Apache + mod_dav ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Apache Core Server mod_http mod_auth_ldap mod_dav mod_dav_fs File system
WebDAV / Meta Data Server (3) Catacomb ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Apache Core Server mod_http mod_auth_ldap mod_dav mod_dav_fs File system DB (MySQL) Catacomb mod_dav_repos
Mass Data Storage Data Stores Logical   View User   Client Storage  Locations
DataFinder  Technical Aspects ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Python API  User Client Extension with GUI import   threading from  datafinder.application  import  search_support from  datafinder.gui.user  import  facade def  searchAndDisplayResult(): &quot;&quot;&quot;Searches and displays the result in the  search result logging window. &quot;&quot;&quot; query =  &quot;displayname contains ‘test’ OR displayname == ‘ab’&quot; result = search_support.performSearch(query) resultLogger = facade.getSearchResultLogger() for path in result.keys(): resultLogger.info( &quot;Found item %s.&quot;  % path)  thread = threading.Thread(target=searchAndDisplayResult) thread.start()
Python API  Command Line Example (without GUI) # Get API from  datafinder.application  import  ExternalFacade externalFacade = ExternalFacade.getInstance() # Connect to a repository externalFacade.performBasicDatafinderSetup(username,    password,    startUrl) # Download the whole content rootItem = externalFacade.getRootWebdavServerItem() items = externalFacade.getCollectionContents(rootItem) for item in items: externalFacade.downloadFile(item, baseDirectory)
Additional “Batteries”… Used Libraries beyond the Python Standard Library (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Additional “Batteries”… Used Libraries beyond the Python Standard Library (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV Client Library Support for DAV Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Simple Use Case: File Upload and Search
 
 
 
 
 
 
 
 
 
 
 
 
Working with DataFinder…
Configuration and Customization Preparing DataFinder for certain “use cases” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Configuration Data Model and Data Stores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Structure Mapping of Organizational Data Structures User Object (collection) Object (file) Relation Attributes (meta data) Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value
Meta Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Impact for Users ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Damn! I’m a great scientist! I want freedom to have  my own directory layout…”
Customization   Python-Scripting for Extension and Automation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Scripting  Downloading File and Starting Application # Download the selected file and try to execute it. from  datafinder.application  import  ExternalFacade from  guitools.easygui  import  * import  os from  tempfile  import  * from  win32api  import  ShellExecute # Get instance of ExternalFacade to access DataFinder API facade = ExternalFacade.getInstance() # Get currently selected collection in DataFinder Server-View  resource = facade.getSelectedResource() if  resource != None: tmpFile = mktemp(ressource.name) facade.downloadFile(resource, tmpFile) if  os.path.exists(tmpFile): ShellExecute(0, None, tmpFile,  &quot;&quot; ,  &quot;&quot; , 1) else : msgbox( &quot;No file selected to execute.&quot; )
Examples…
Example 1: Turbine Simulation
Example 1:  Fluid Dynamics Simulation Turbine Simulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Turbine Simulation Data Model
Turbine Simulation: Graphical User Interface
Turbine Simulation: Customized GUI Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],1 2 3 4 5
Turbine Simulation  Starting External Applications ,[object Object],[object Object],[object Object],1 2 3
Example 2: Automobile Supplier
Example 2:  Automobile Supplier DataFinder for Simulation and Data Management  ,[object Object],[object Object],[object Object],[object Object]
Automobile Supplier Data Model
Automobile Supplier Configuration of Customers Parameters
Automobile Supplier Management of Simulations ,[object Object],[object Object],[object Object],[object Object]
Automobile Supplier Upload, Download, and Versioning of Files ,[object Object],[object Object],[object Object]
Example 3: Air Traffic Management
Example 3:  Air Traffic Monitoring   Database for Air Traffic Monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Database for Air Traffic Monitoring Data Model and Data Migration
Database for Air Traffic Monitoring Data Import Wizard ,[object Object],[object Object],[object Object]
Database for Air Traffic Monitoring Search Results
Current Work and Future Plans  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Am Ende… Hinweise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Availability ,[object Object],[object Object],[object Object],[object Object]
Links ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions?

More Related Content

What's hot

Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksLucidworks
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastDatabricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityFabian Hueske
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™Databricks
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache HiveMurtaza Doctor
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outlineIan Duncan
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
 
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...Ángel Jesús Martínez Bernal
 
Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling RezaAbholhassni
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
 

What's hot (20)

Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
 
RSI at the HDF & HDF-EOS Workshop VI
RSI at the HDF & HDF-EOS Workshop VIRSI at the HDF & HDF-EOS Workshop VI
RSI at the HDF & HDF-EOS Workshop VI
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
 
Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 

Similar to Organizing Scientists' Data Chaos with DataFinder

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)Data Finder
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...Andreas Schreiber
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docxfantabulous2024
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAElynneblue
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Tony Frame
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Naoki (Neo) SATO
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010DavidGristwood
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudMicrosoft ArcReady
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 

Similar to Organizing Scientists' Data Chaos with DataFinder (20)

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docx
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAE
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The Cloud
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 

More from Andreas Schreiber

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Andreas Schreiber
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityAndreas Schreiber
 
Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureAndreas Schreiber
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterAndreas Schreiber
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsAndreas Schreiber
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsAndreas Schreiber
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data ScienceAndreas Schreiber
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using ComicsAndreas Schreiber
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenAndreas Schreiber
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with PythonAndreas Schreiber
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataAndreas Schreiber
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestAndreas Schreiber
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataAndreas Schreiber
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionAndreas Schreiber
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannAndreas Schreiber
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenAndreas Schreiber
 

More from Andreas Schreiber (20)

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented Reality
 
Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructure
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace Center
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket Scientists
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality Headsets
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data Science
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using Comics
 
Quantified Self Comics
Quantified Self ComicsQuantified Self Comics
Quantified Self Comics
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Python at Warp Speed
Python at Warp SpeedPython at Warp Speed
Python at Warp Speed
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self Data
 
Open Source im DLR
Open Source im DLROpen Source im DLR
Open Source im DLR
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The Rest
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris Data
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & Exposition
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermann
 
Big Python
Big PythonBig Python
Big Python
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-Sensoren
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Organizing Scientists' Data Chaos with DataFinder

  • 1. DataFinder: Organizing the Data Chaos of Scientists PyCon UK 2008 (September 12 th , 2008, Birmingham) Andreas Schreiber < Andreas.Schreiber@dlr.de> German Aerospace Center (DLR), Cologne http://www.dlr.de/sc
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. “ Python has the cleanest, most-scientist- or engineer friendly syntax and semantics. Paul F. Dubois Paul F. Dubois. Ten good practices in scientific programming. Comp. In Sci. Eng., Jan/Feb 1999, pp.7-11
  • 12.
  • 13.
  • 14.
  • 15. DataFinder Client Graphical User Interfaces User Client Administrator Client Implementation in Python with Qt/PyQt
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Mass Data Storage Data Stores Logical View User Client Storage Locations
  • 21.
  • 22. Python API User Client Extension with GUI import threading from datafinder.application import search_support from datafinder.gui.user import facade def searchAndDisplayResult(): &quot;&quot;&quot;Searches and displays the result in the search result logging window. &quot;&quot;&quot; query = &quot;displayname contains ‘test’ OR displayname == ‘ab’&quot; result = search_support.performSearch(query) resultLogger = facade.getSearchResultLogger() for path in result.keys(): resultLogger.info( &quot;Found item %s.&quot; % path) thread = threading.Thread(target=searchAndDisplayResult) thread.start()
  • 23. Python API Command Line Example (without GUI) # Get API from datafinder.application import ExternalFacade externalFacade = ExternalFacade.getInstance() # Connect to a repository externalFacade.performBasicDatafinderSetup(username, password, startUrl) # Download the whole content rootItem = externalFacade.getRootWebdavServerItem() items = externalFacade.getCollectionContents(rootItem) for item in items: externalFacade.downloadFile(item, baseDirectory)
  • 24.
  • 25.
  • 26.
  • 27. Simple Use Case: File Upload and Search
  • 28.  
  • 29.  
  • 30.  
  • 31.  
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37.  
  • 38.  
  • 39.  
  • 41.
  • 42.
  • 43. Data Structure Mapping of Organizational Data Structures User Object (collection) Object (file) Relation Attributes (meta data) Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value
  • 44.
  • 45.
  • 46.
  • 47. DataFinder Scripting Downloading File and Starting Application # Download the selected file and try to execute it. from datafinder.application import ExternalFacade from guitools.easygui import * import os from tempfile import * from win32api import ShellExecute # Get instance of ExternalFacade to access DataFinder API facade = ExternalFacade.getInstance() # Get currently selected collection in DataFinder Server-View resource = facade.getSelectedResource() if resource != None: tmpFile = mktemp(ressource.name) facade.downloadFile(resource, tmpFile) if os.path.exists(tmpFile): ShellExecute(0, None, tmpFile, &quot;&quot; , &quot;&quot; , 1) else : msgbox( &quot;No file selected to execute.&quot; )
  • 49. Example 1: Turbine Simulation
  • 50.
  • 51.
  • 53.
  • 54.
  • 56.
  • 58. Automobile Supplier Configuration of Customers Parameters
  • 59.
  • 60.
  • 61. Example 3: Air Traffic Management
  • 62.
  • 63. Database for Air Traffic Monitoring Data Model and Data Migration
  • 64.
  • 65. Database for Air Traffic Monitoring Search Results
  • 66.
  • 67.
  • 68.
  • 69.