SlideShare a Scribd company logo
1 of 27
Download to read offline
2020 VISION
ALEX HENDERSON
UNIVERSITY OF MANCHESTER
THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY
SURFACESPECTRA LIMITED alexhenderson.info
@AlexHenderson00
@ChiToolbox
Kick-off Meeting
and Hackathon
3-6 Feb, 2020
Ruhr-University Bochum
DUBIOUS
DESIGN
DECISIONS
ALEX HENDERSON
UNIVERSITY OF MANCHESTER
THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY
SURFACESPECTRA LIMITED alexhenderson.info
@AlexHenderson00
@ChiToolbox
Kick-off Meeting
and Hackathon
3-6 Feb, 2020
Ruhr-University Bochum
http://clirspec.org
@clirspec
34 members on
steering council
Travel bursaries available
http://spec2020.com
17–22 May, 2020
Summer School
7–10 July, 2020
https://springscix.org/
6–9 April, 2020
Travel bursaries available
22–27 August, 2021
CLIRSPEC DATA
Online community for us to share
algorithms, code and ideas
Hosted on Slack
Request an invitation to join
Any member can add anyone else
http:// tiny.cc / clirspec-data
CHITOOLBOX
•https://bitbucket.org/AlexHenderson/chitoolbox
• Open source MATLAB toolbox
• Infrared, Raman, secondary ion mass spectrometry (SIMS)
• Spectra and hyperspectral images
• Library, not a GUI (only 1-2 dialog boxes)
• Object oriented design
@ChiToolbox
DESIGN DECISIONS
• What worked?
• What did not work?
• What were the compromises?
• What would I do differently?
OBJECT ORIENTED PROGRAMMING (OOP)
• Abstract base classes for spectra, spectral collections and images
• Concrete classes for above
• ‘Interface’ classes for ‘Raman character’, ‘IR character’ etc.
• Multiple inheritance to define technique specific classes
• eg. IRSpectrum, RamanImage, (ToF)MSSpectralCollection
• Separate classes for pictures, RMieS options, PCA or RF models etc.
• Model using classes where possible
• Provides type-identification and bespoke functionality
FILE FORMATS
• Agilent (FTIR)
• Single FTIR images and mosaicked FTIR images
• Biotof (ToFSIMS)
• Spectra, hyperspectral image files
• Bruker (FTIR)
• Opus files and multiple spectra exported as a MAT file
• Ionoptika (ToFSIMS)
• Hyperspectral image files exported in HDF5 format
• Mettler Toledo (FTIR)
• Spectra exported in ASCII
• Renishaw (Raman)
• WiRE Version 4, spectral and hyperspectral images
• Thermo Fisher Scientific GRAMS SPC (Generic)
• Data stored in spc files
Single files can be read using ChiFile. This works out the file format automatically.
In addition
Readable, but unreleased
• Photothermal (FTIR and Raman)
• mIRage spectra and hyperspectral images
• IONTOF (ToFSIMS)
• Hyperspectral images in grd format
FILE FORMAT ISSUES
• Some formats were hacked eg. Agilent
• What if example files were specific to certain instrumentation?
• Some formats are multi-purpose
• Some formats hold only one data type (spectrum, line scan, image etc)
• Eg. Agilent single tile format
• Some formats can contain any of these data types
• Eg. Renishaw
• If we read multiple files, what should we do if their contents are of different types?
REUSE OF EXTERNAL CODE (CREDITED)
• Perceptually uniform
colormaps
• error_ellipse
• GSTools
• shadedErrorBar
• mksqlite
• sql_object
• cividis
• sgolayfilt
• RMieS
• cluster-toolbox
• getSubclasses
• GUI Layout Toolbox
• DataHash
• GetFullPath
• m2html
• ImportOpus
• ME-EMSC
• Thresholding Tool
• ENVI file reader/writer
• read_envihdr
SOFTWARE LICENCE
• ChiToolbox released under GNU General Public License 3.0 (GPL)
• External code is GPL, or more liberal (eg. MIT)
• GPL ‘infects’ the codebase
• User must release any code that intrinsically links to this code
• Prefer GNU Lesser General Public License (LGPL)
• Your codebase is not affected, but changes must be shared
• Unfortunately, LGPL and GPL are not compatible
MATLAB ISSUES
• Tried to make backwardly compatible with R2009a
• Too painful
• Roughly compatible with R2016a
• Trying to reduce toolbox dependencies (eg. Statistics toolbox)
• MATLAB OOP not great
• Variables pass by value, but handle classes pass by reference. Makes copying difficult
• Rolled my own deep copy mechanism (clone)
DATA TYPES
• Single spectrum, spectral collection, hyperspectral image
• Continuous data
• Did not consider multispectral data (discrete wavenumber)
• Discontinuous, cannot take first derivative etc.
• Data is a property of the object, not a pointer/function to a data storage type
METADATA
• Separate class from data type
• Automatically label plots (eg PCA scores)
• Build lists of labels manually
labels = ChiClassMembership('mylabels','beta',1, 'gamma',2, 'beta',3, 'alpha',2);
• Automatically read from specially designed Excel spreadsheet
• Handles logical, category and numeric types
• Need to remove label from metadata if removing spectrum from collection
Users not sure of difference between numeric and
category types, when using numbered samples
DEFAULTS
• Try to provide ‘reasonable’ default values
• PCA denoising defaults to 30% of PCs retained
• Random Forest defaults to 80% training and 20% test sets
• Should default to 5-fold cross validation, but takes time
• Random Forest defaults to using parallel processing if data set is large
• MATLAB is slow to initialise worker pool
• All parameters are user-configurable
VISUALISATION
• Graphics use perceptually neutral colormaps
• Caters for colour vision deficiency (colour blindness)
• Colour-mapped PCA image scores and loadings plots
• Dialog box for Raman baseline removal
• Asymmetric least squares baseline modelling requires user input
• Confidence limits on PCA/CVA* scores plots
• Default = 95%, but user variable
• RMieS iteration change plot
*Canonical variates analysis
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR MATLAB JET COLORMAP
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR MATLAB PARULA COLORMAP
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR PYTHON VIRIDIS COLORMAP
Python’s Inferno, Magma, Plasma and Viridis colormaps implemented in MATLAB
2020 VISION
(IF I HAD A TIME MACHINE)
• Developed more tests
• Added support for discrete wavenumber data
• Separated data storage from data manipulation
• Used database (SQLite) to manage metadata
• Considered OOP for data storage, but functional programming for operations
2020 VISION
(IF I HAD A TIME MACHINE)
Write it all in Python!
…or C++

More Related Content

Similar to 2020 Vision (Dubious Design Decisions)

Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionBhaskar J.Roy
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapDr. Mirko Kämpf
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesWes McKinney
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchAli Kheyrollahi
 
Machine learning with R
Machine learning with RMachine learning with R
Machine learning with RMaarten Smeets
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyVince Smith
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyVince Smith
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha Talagala
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 

Similar to 2020 Vision (Dubious Design Decisions) (20)

Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_version
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Machine learning with R
Machine learning with RMachine learning with R
Machine learning with R
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easy
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easy
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 

More from Alex Henderson

Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data IssuesAlex Henderson
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?Alex Henderson
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classificationAlex Henderson
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your dataAlex Henderson
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DAlex Henderson
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisAlex Henderson
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyAlex Henderson
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your modelAlex Henderson
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraAlex Henderson
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometryAlex Henderson
 

More from Alex Henderson (11)

Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
 

Recently uploaded

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Recently uploaded (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

2020 Vision (Dubious Design Decisions)

  • 1. 2020 VISION ALEX HENDERSON UNIVERSITY OF MANCHESTER THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY SURFACESPECTRA LIMITED alexhenderson.info @AlexHenderson00 @ChiToolbox Kick-off Meeting and Hackathon 3-6 Feb, 2020 Ruhr-University Bochum
  • 2. DUBIOUS DESIGN DECISIONS ALEX HENDERSON UNIVERSITY OF MANCHESTER THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY SURFACESPECTRA LIMITED alexhenderson.info @AlexHenderson00 @ChiToolbox Kick-off Meeting and Hackathon 3-6 Feb, 2020 Ruhr-University Bochum
  • 9. CLIRSPEC DATA Online community for us to share algorithms, code and ideas Hosted on Slack Request an invitation to join Any member can add anyone else http:// tiny.cc / clirspec-data
  • 10. CHITOOLBOX •https://bitbucket.org/AlexHenderson/chitoolbox • Open source MATLAB toolbox • Infrared, Raman, secondary ion mass spectrometry (SIMS) • Spectra and hyperspectral images • Library, not a GUI (only 1-2 dialog boxes) • Object oriented design @ChiToolbox
  • 11. DESIGN DECISIONS • What worked? • What did not work? • What were the compromises? • What would I do differently?
  • 12. OBJECT ORIENTED PROGRAMMING (OOP) • Abstract base classes for spectra, spectral collections and images • Concrete classes for above • ‘Interface’ classes for ‘Raman character’, ‘IR character’ etc. • Multiple inheritance to define technique specific classes • eg. IRSpectrum, RamanImage, (ToF)MSSpectralCollection • Separate classes for pictures, RMieS options, PCA or RF models etc. • Model using classes where possible • Provides type-identification and bespoke functionality
  • 13. FILE FORMATS • Agilent (FTIR) • Single FTIR images and mosaicked FTIR images • Biotof (ToFSIMS) • Spectra, hyperspectral image files • Bruker (FTIR) • Opus files and multiple spectra exported as a MAT file • Ionoptika (ToFSIMS) • Hyperspectral image files exported in HDF5 format • Mettler Toledo (FTIR) • Spectra exported in ASCII • Renishaw (Raman) • WiRE Version 4, spectral and hyperspectral images • Thermo Fisher Scientific GRAMS SPC (Generic) • Data stored in spc files Single files can be read using ChiFile. This works out the file format automatically. In addition Readable, but unreleased • Photothermal (FTIR and Raman) • mIRage spectra and hyperspectral images • IONTOF (ToFSIMS) • Hyperspectral images in grd format
  • 14. FILE FORMAT ISSUES • Some formats were hacked eg. Agilent • What if example files were specific to certain instrumentation? • Some formats are multi-purpose • Some formats hold only one data type (spectrum, line scan, image etc) • Eg. Agilent single tile format • Some formats can contain any of these data types • Eg. Renishaw • If we read multiple files, what should we do if their contents are of different types?
  • 15. REUSE OF EXTERNAL CODE (CREDITED) • Perceptually uniform colormaps • error_ellipse • GSTools • shadedErrorBar • mksqlite • sql_object • cividis • sgolayfilt • RMieS • cluster-toolbox • getSubclasses • GUI Layout Toolbox • DataHash • GetFullPath • m2html • ImportOpus • ME-EMSC • Thresholding Tool • ENVI file reader/writer • read_envihdr
  • 16. SOFTWARE LICENCE • ChiToolbox released under GNU General Public License 3.0 (GPL) • External code is GPL, or more liberal (eg. MIT) • GPL ‘infects’ the codebase • User must release any code that intrinsically links to this code • Prefer GNU Lesser General Public License (LGPL) • Your codebase is not affected, but changes must be shared • Unfortunately, LGPL and GPL are not compatible
  • 17. MATLAB ISSUES • Tried to make backwardly compatible with R2009a • Too painful • Roughly compatible with R2016a • Trying to reduce toolbox dependencies (eg. Statistics toolbox) • MATLAB OOP not great • Variables pass by value, but handle classes pass by reference. Makes copying difficult • Rolled my own deep copy mechanism (clone)
  • 18. DATA TYPES • Single spectrum, spectral collection, hyperspectral image • Continuous data • Did not consider multispectral data (discrete wavenumber) • Discontinuous, cannot take first derivative etc. • Data is a property of the object, not a pointer/function to a data storage type
  • 19. METADATA • Separate class from data type • Automatically label plots (eg PCA scores) • Build lists of labels manually labels = ChiClassMembership('mylabels','beta',1, 'gamma',2, 'beta',3, 'alpha',2); • Automatically read from specially designed Excel spreadsheet • Handles logical, category and numeric types • Need to remove label from metadata if removing spectrum from collection
  • 20. Users not sure of difference between numeric and category types, when using numbered samples
  • 21. DEFAULTS • Try to provide ‘reasonable’ default values • PCA denoising defaults to 30% of PCs retained • Random Forest defaults to 80% training and 20% test sets • Should default to 5-fold cross validation, but takes time • Random Forest defaults to using parallel processing if data set is large • MATLAB is slow to initialise worker pool • All parameters are user-configurable
  • 22. VISUALISATION • Graphics use perceptually neutral colormaps • Caters for colour vision deficiency (colour blindness) • Colour-mapped PCA image scores and loadings plots • Dialog box for Raman baseline removal • Asymmetric least squares baseline modelling requires user input • Confidence limits on PCA/CVA* scores plots • Default = 95%, but user variable • RMieS iteration change plot *Canonical variates analysis
  • 23. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR MATLAB JET COLORMAP
  • 24. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR MATLAB PARULA COLORMAP
  • 25. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR PYTHON VIRIDIS COLORMAP Python’s Inferno, Magma, Plasma and Viridis colormaps implemented in MATLAB
  • 26. 2020 VISION (IF I HAD A TIME MACHINE) • Developed more tests • Added support for discrete wavenumber data • Separated data storage from data manipulation • Used database (SQLite) to manage metadata • Considered OOP for data storage, but functional programming for operations
  • 27. 2020 VISION (IF I HAD A TIME MACHINE) Write it all in Python! …or C++