How AI, OpenAI, and ChatGPT impact business and software.
Machine learning for Science and Society
1. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Faculty of Science
Machine Learning for Science and Society
Some recent results
Christian Igel
Department of Computer Science
igel@diku.dk
Slide 1/30
2. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Why machine learning?
• Sometimes problems cannot be solved in the traditional way
because
• the designer’s knowledge is limited, and/or
• the sheer complexity and variability precludes an
accurate description.
• However, large amounts of data describing the task are often
available or can be automatically obtained – and machine
learning can solve the problem.
Slide 4/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
3. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Why machine learning?
• Sometimes problems cannot be solved in the traditional way
because
• the designer’s knowledge is limited, and/or
• the sheer complexity and variability precludes an
accurate description.
• However, large amounts of data describing the task are often
available or can be automatically obtained – and machine
learning can solve the problem.
Slide 5/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
Machine learning turns data into knowledge
Machine learning automates the process of
inductive inference:
1 Observe a phenomenon
2 Construct a model of that phenomenon
3 Make predictions using this model
4. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Alzheimer’s Disease progression
Disease progression
Cells
healthy neurons formation of A¡ plaques
and NFTs
neuronal death
NFTs
A plaques
MRI
structure normal size/shape normal size/shape abnormal size/shape
(atrophy)
MRI
intensity normal variation abnormal variation abnormal variation
Scale
Slide 7/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
5. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Hippocampal texture from structural MRI
• Image analysis of brain scans
• Segmentation of brain regions
(hippocampi)
• Extraction of image features
• Classification using SVMs
Sørensen, Igel, Hansen, Osler, Lauritzen, Rostrup, Nielsen. Early detection of Alzheimer’s disease using MRI hippocampal
texture. Human Brain Mapping, 2016
Slide 8/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
6. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
CADDementia Challenge
• Computer-Aided Diagnosis of Dementia based on structural
MRI data (CADDementia) competition
• Goal: Diagnosis into Alzheimer’s disease (AD), mild cognitive
impairment (MCI), and normal controls (NC)
• No labelled training data was provided, we trained on freely
available data sets.
• We combined several biomarkers (cortical thickness
measurements, hippocampal shape, hippocampal texture,
and volumetric measurements).
Sørensen, Pai, Anker, Balas, Lillholm, Igel, Nielsen. Dementia Diagnosis using MRI Cortical Thickness, Shape, Texture,
and Volumetry. MICCAI 2014 – Challenge on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data, 2014
Bron et al. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: The
CADDementia challenge. NeuroImage, 2015
Slide 9/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
7. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
CADDementia: Results
Slide 10/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
• Our biomarker outperforms the state-of-the-art.
• The texture score remains significant if combined with
other biomarkers.
• The biomarker generalizes across patient populations.
8. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Deep learning
• Deep learning refers to ML architectures
composed of multiple levels of non-linear
transformations.
• Idea: Extracting more and more abstract
features from input data, learning more
abstract representations.
• Representations can be learned in a
supervised and/or unsupervised manner.
• Example: Convolutional neural networks
(CNNs) are popular deep learning
architectures.
LeCun, Bottou, Bengio, Haffner. Gradient-based learning applied to
document recognition. Proceedings of the IEEE, 1998
Bengio. Learning Deep Architectures
for AI. Foundations and Trends in Ma-
chine Learning 2(1): 1–127, 2009
Slide 12/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
9. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Breast cancer
• Breast cancer is a frequent cause of death among women
• Screening programs halve the risk of death1 and reduce
mortality by 28-36 %2
• Still: 33 % of cancers are missed,3 70 % of referrals are false
positives,4 and 25 % of cancers could have been detected
earlier5
• Goal: More accurate image-based biomarkers allowing
personalized breast cancer screening
1Otto et al. Cancer Epidemiol Biomarkers Prev 21, 2012
2Broeders et al. J Med Screen 19, 2012
3Karssemeijer et al. Radiology 227, 2003
4Yankaskas et al. Am J Roentgenol 177, 2001
5Timmers et al. Eur J Public Health, 2012
Slide 13/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
10. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Breast density scoring
Breast density is related to breast cancer risk, the risk of missing
breast cancer, and the risk of false positive referral.
input manual CNN
Current work: Better biomarkers using CNN density scores and
CNN texture scores
Petersen, Nielsen, Diao, Karssemeijer, Lillholm. Breast Tissue Segmentation and Mammographic Risk Scoring Using Deep
Learning. Digital Mammography / IWDM, 2014
Slide 14/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
11. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Knee cartilage segmentation
• Cartilage segmentation in knee MRI is the method of choice
for quantifying cartilage deterioration.
• Cartilage deterioration implies osteoarthritis, one of the major
reasons for work disability in the western world.
input manual CNN
Prasoon, Petersen, Igel, Lauze, Dam, Nielsen. Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar
Convolutional Neural Network. Medical Image Computing and Computer Assisted Intervention (MICCAI), LNCS 8150, 2013
Slide 16/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
12. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Nuclear test ban treaty verification
• CTBTO (Peparatory Commission for the Comprehensive
Nuclear-Test-Ban Treaty Organisation) watches out for
nuclear weapon tests
• Verification of the comprehensive nuclear-test-ban treaty
• 15 GB of data every day from world-wide sensor network
• Data analysis must be partly automatic
Tuma et al. Integrated optimization of long-range underwater signal detection, feature ex-
traction, and classification for nuclear treaty monitoring. IEEE Transactions on Geoscience
and Remote Sensing, to appear
Slide 18/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
13. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Typical tasks in astronomy
Slide 20/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
“Can you group all detected objects into classes?
For instance, into stars, galaxies, and other objects
. . . ’ ?’
“Can you find weird objects (which look different
from other ones)?”
“Can you find weird objects (which look different
from other ones)?”
“I am interested in these distant galaxies. Can you
find more of them? Can you find the most distant
ones?”
14. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Big data
Today’s and future projects
• Sloan Digital Sky Survey (SDSS) = 50TB in total
• Large Synoptic Survey Telescope (LSST) = 30TB per night
• Low Frequency Array (LOFAR) = 250TB per night
• Square Kilometer Array (SKA) = 250TB per second
Huge amounts of photometric data, however, detailed
spectroscopic follow-ups are expensive.
Slide 21/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
15. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Specific star formation rate
• Goal: Predict specific star formation rate based on
photometric data
• Approach: Machine learning (ML) augmenting standard
features with image texture features
Slide 22/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
16. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Texture features
Original image
Scale-space
Curvedness
Shape Index
Shape Index
weighted by
Curvedness
Slide 23/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
17. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Predicting the specific star formation rate
• ML achieves higher accuracy than physical modelling
• New data structures and GPU programming allow to process
huge amounts of data
Stensbo-Smidt, Igel, Zirm, Steenstrup Pedersen. Nearest Neighbour Regression Outperforms Model-based Prediction of
Specific Star Formation Rate. IEEE Big Data 2013, 2013
Steenstrup Pedersen, Stensbo-Smidt, Zirm, Igel. Shape Index Descriptors Applied to Texture-Based Galaxy Analysis.
International Conference on Computer Vision (ICCV), 2013
Gieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. International
Conference on Machine Learning (ICML), 2014
Slide 24/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
18. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Outline
1 Machine Learning
2 Medicine
Diagnosis and Prognosis of Alzheimer’s Disease
Analysis of Mammograms
Knee Cartilage Segmentation
3 Nuclear-Test-Ban Treaty Monitoring
4 Astronomy
5 End Titles
Slide 26/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
19. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Shark
image.diku.dk/shark
Igel, Glasmachers, Heidrich-Meisner: Shark, Journal of Machine Learning Research 9:993–996, 2008
Gold Prize at Open Source Software Wold Challenge 2011
Slide 27/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
20. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Buffer k-d trees
http://bufferkdtree.readthedocs.org
Gieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. International
Conference on Machine Learning (ICML), 2014
Slide 28/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
21. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Industrial Data Analysis Service (IDAS)
Slide 29/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk
22. U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Thanks
Machine Learning Lab
http://image.diku.dk/MLLab
Image Section
http://www.diku.dk/english/research/imagesection
Special thanks: Sune Darkner, Fabian Gieseke, Mads Nielsen,
Cosmin Oancea, Akshay Pai, Kim Steenstrup Pedersen, Kai
Lars Polsterer, Lauge Sørensen, Kristoffer Stensbo-Smidt
Slide 30/30 — Christian Igel — Machine Learning for Science and Society — igel@diku.dk