SlideShare a Scribd company logo
1 of 27
Scientific visualization of data
Research Methodology
Seppo Karrila
September 2017 (2560 Thai)
Executive summary
• This is about visualizing usually smallish sets
of experimental data from lab, for use in
science
– This is NOT about beautiful impressive artistic and
emotionally enticing “infographics” to average
consumers
• Emphasis is on insights, creating or
corroborating hypotheses, or assessing
research questions
The purpose of visualization
• Sight is your most important sense
– One look at an image provides lots of information
very rapidly – a picture is worth a thousand words
– You are inherently good at detecting patterns
visually
• Seeing an “outlier” in a table is very difficult, it is often
easy to see in a graph
• The real purpose is “insights”, getting higher
level summary that may be useful
Fun quotes to know
• The purpose of computing is insight, not
numbers. (Richard Hamming)
• The purpose of the numbers computed is not
yet in sight. (unknown computer simulation
expert)
What kinds of insights are useful?
• Confirmatory (often “corroborating” would be a better word)
– You already have a hypothesis, or expectation of a pattern, now you
look for corroborating evidence in NEW data
• Creating new hypotheses
– Observing patterns that serve as “research questions”: will this
pattern continue or repeat, is it present in other datasets (from
another time, another experiment, or another region)
– You CAN’T CONFIRM A HYPOTHESIS FROM THE SAME DATA THAT GAVE
IT
• This is a problem for example with climate science, since we only have one history.
We can only possibly confirm, with last 10 years of data, a hypothesis that was
made from similar data at least ten years ago. And for “climate” ten years is too
short anyway…
• So if I create a hypothesis from available data and publish it, then you go “confirm
it” in the same data, this is foundationally absolute nonsense.
• Recall that predictive models need three data sets: training, validation, and test
sets.
More about confirmatory evidence
• The only way to prove causation:
– You can turn factor A on/off. Every time you turn it on, soon
after B happens. This convinces you that “A on” causes B. Note
that this requires experiments with manipulation of A!
• Observational data (no manipulated variables) cannot
prove causation
– It can disprove it though. If B happened first, then it was not
caused by A. This is the foundation of “Granger causality”.
– If you are observing a game between intelligent opponents,
then B could anticipate moves by A and try to counter them
ahead of time. So the future opportunities of A can cause
anticipating reactions by B. This is not how natural phenomena
work, so Granger should be OK for us.
• In any case, a game should not be modeled by simple one-time-step
rules, but natural phenomena mostly obey such difference or
differential models. In other words, the simple concept of “causation”
is not appropriate for games between intelligent players.
So keep in mind…
• Statistical tests that claim to prove A affects B usually
prove no such thing at all. The causation comes from
your understanding of the world, and statistics helps
you convince others…
– It is a bit of black magic there
– For example, the p-value is this:
• Assuming given statistical distributions (often Gaussian normal)
and that your null hypothesis is correct, the probability that the
chosen statistic summarizing your observed data could be more
extreme than what it is.
• It takes a lot of magic to now say: small p clearly means causation.
Statistics only shows correlations… it can’t tell if A caused B or the
other way around, or if C influenced both A and B
So let’s get to visualizations…
• In Excel, Home >
Conditional formatting
allows seeing the numbers
in a table.
• You can also play with
Insert > Sparklines which
allows making tiny graphs
within cells
• Easy to spot smallest and
largest, get some
impression of distribution
Random numbers
0.731033584
0.806055053
0.988103245
0.809884417
0.756027069
0.462190297
0.910670142
0.906566945
0.501780587
0.181802984
0.659130022
0.32821301
0.111329819
0.617390297
0.252291447
0.990308253
0.274208995
0.614407383
0.298483381
0.526614001
0.01251721
Scatter plots in
Excel
• Illustration of
Simpson’s paradox
(from Wikipedia)
– Ignoring a factor
can give
completely wrong
trend
• Seppo’s paradox
– One single failed
experiment can
give high R2
Trouble with Excel
• Even making a plot showing Simpson’s paradox is difficult,
Excel does not allow to format the markers by some
factor…
• However, most people can manipulate data in Excel, do
some basic transformation, delete an outlier that would
spoil the analysis (i.e., a failed experiment)
– Statisticians can make up theories and criteria for what is an
outlier. For an experimentalist, if you trust the experiment, then
it can be an interesting special case… What counts is whether
the data is real or corrupted. So one persons outlier can be the
important special case for another.
• Remember to keep your raw data safe. Do the analysis,
including deleting outliers, in a separate file, preferably in a
separate folder altogether !
Pivot tables and charts
• These are excellent for inspecting effects of
multiple factors, especially when each factor
only has two or three levels
• Note: often you want to “paste special”
choosing “values”, maybe also “transpose”
– Copying formulas instead of values can be trouble
– Next page has a data table, explore it in Excel…
Data for pivoting
Starch Preproc Temp C A B E F awcrit minerr maxerr Mrcrit
cassava pregel 25 0.968875 -7.19313 4.136666 11.49491 4.369757 0.594448 -0.1752 0.267812 11.20288
cassava pregel 35 0.961012 -6.59564 4.80928 12.37645 3.90624 0.560223 -0.11034 0.200768 10.83981
cassava raw 25 1.084315 -9.35922 6.394428 11.54028 6.14054 0.584587 -0.15701 0.234026 12.88684
cassava raw 35 1.053811 -8.16377 6.670096 12.89617 5.238465 0.567919 -0.19874 0.207021 12.56244
mix pregel 25 0.970572 -7.23715 4.387437 12.13923 4.860418 0.661974 -0.18178 0.335665 12.89627
mix pregel 35 0.956754 -6.33248 5.169849 13.17968 3.939175 0.628266 -0.1169 0.189642 12.21953
mix raw 25 1.011224 -6.82134 7.067797 13.1484 5.482373 0.652614 -0.07943 0.114265 14.0632
mix raw 35 1.03495 -7.5668 6.548293 13.77577 4.80145 0.647217 -0.13632 0.238465 13.71736
rice pregel 25 0.976289 -7.87222 3.483088 11.59115 4.755083 0.649166 -0.13634 0.280165 12.27966
rice pregel 35 0.963513 -7.03764 4.010848 12.02805 3.972963 0.607599 -0.2066 0.231581 11.28119
rice raw 25 1.00602 -6.96414 7.248538 13.4071 5.88229 0.672905 -0.18423 0.139877 14.904
rice raw 35 1.023475 -7.41054 6.808834 13.77478 5.185373 0.655075 -0.20526 0.113226 14.20889
Reproduce this pivot chart…
• Note that you
can sort the
“axis fields”,
and this affects
the grouping
– You can select
a primary
comparison
How about fitting a model?
• There are very basic model options ready-made
as trendlines in Excel
• What you really have to do typically is this:
– Your inputs and targeted model output y are in
columns
– You guess starting values for model parameters,
calculate model output y~ with these
– For every data point you form squared error (y-y~)2
– Sum the column of squared errors, then minimize the
sum by using Data > Solver, which adjusts the model
parameters
Note about the basic solver
• There is a better option freely available for download,
search for DirectOptimizer (you need to install it as
add-in)
– It comes with a small manual that helps you get started
• The point
– If you need to fit Arrhenius law, or whatever other model
from physics or physical chemistry, then you pretty much
have to do “nonlinear least squares” fitting
• Even if there is a “linearizing transformation” the error sum gets
also transformed, and the results can be poor because of this
– much of the time you can do this in Excel…
Free statistics packages
• Check out JASP or JAMOVI
– The two are very similar, JASP has some special
Bayesian statistics that are unconventional
– Note again that while people think of Bayesian
probability as causation, NO statistical test actually
proves anything about causality! (Bayesian networks
are sometimes called “causal networks”, which sounds
good but is absolutely misleading. JASP doesn’t do
them though.)
• JAMOVI current version is 0.8.0.5
– It appears to get more frequent updates than JASP
Hands-on exploration of JAMOVI
• Basic functionality for
– Importing data
– Adjusting metadata on variables (type, levels)
– Inspecting basic statistics
– Plotting the correlation matrix
• Note
– You can’t get a matrix scatter plot of multiple
variables from Excel…
Iris data in JAMOVI
• It is easy to
generate
fancy plots
of how the
data are
distributed.
• However,
you can’t
create
classifiers in
JAMOVI…
Significances of correlations
Correlation Matrix
Sepal.Leng
th
Sepal.Wid
th
Petal.Leng
th
Petal.Wid
th
Sepal.Length — -0.118 0.872 *** 0.818 ***
Sepal.Width — -0.428 *** -0.366 ***
Petal.Length — 0.963 ***
Petal.Width —
Note. * p < .05, ** p < .01,
*** p < .001
Note: copy/paste to Word works well, not so well from JAMOVI to PowerPoint. I
used OneNote as intermediate to get this into PowerPoint… Less than perfect.
The point of correlations?
• IF some variable is assumed causal, then the
trends of effects are important
– B increases or decreases with the manipulated
variable A
• If two independently measured variables have
a high correlation, then neither is badly
corrupted by noise
– Correlation indicates there is mutual information,
a variable that carries no information about
anything else might as well be noise
• Pretty nice
matrix
scatterplot
from
JAMOVI
A first look at DataWarrior
• Current version 4.6.1 from Openmolecules.org
• Even if you run 64 bit Windows, take the 32
bit version – it can handle large enough data
sets
• This is a freely available professional quality
software package
– Too many features to cover… several tutorials are
available on YouTube
Iris data again
• I selected all columns in data view of JAMOVI,
copied, pasted to Excel, put back column
labels
• Then did “paste special” with headers to get
into DataWarrior
• In DataWarrior it is easy to assign marker color, size, etc., to a feature or
variable, so one plot can display multiple dimensions.
• 3-D scatterplots are easy to make a manipulate also…
What I encourage is this…
• Get yourself free software
– Then learning to use it is a safe investment, because
you are not cut off by fees or licensing
• The first thing to do with new data is to look at it.
Let the data guide you more than your own prior
assumptions.
– It is big effects that are important, you should be able
to see them
– Statistically significant differences almost always
emerge if you just collect enough samples – checking
significances is to a large part a ritual without much
meaning for practice
Conclusions
• Most people are handy with Excel and use it to collect and
manipulate data
– It has some ability for visualization, but very limited. See how far
it can take you… maybe it is enough
– It is good for transforming data by calculating new columns
• There is now free software for some basic exploratory
plotting and statistics
– JASP and JAMOVI appear convenient for a non-statistician
• For industry-strength visualizations DataWarrior is a free
desktop application
• None of the above is for learning classifiers or for doing
nonlinear regression… but you can do basic nonlinear
regression easily in Excel, with some manual labor
– Get DirectOptimizer add-in, at no cost

More Related Content

What's hot

Week4 Ensure Analysis Is Accurate And Complete
Week4 Ensure Analysis Is Accurate And CompleteWeek4 Ensure Analysis Is Accurate And Complete
Week4 Ensure Analysis Is Accurate And Completehapy
 
6 Modelling Purposes
6 Modelling Purposes6 Modelling Purposes
6 Modelling PurposesBruce Edmonds
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12Mazhar Poohlah
 
Aed1222 lesson 4
Aed1222 lesson 4Aed1222 lesson 4
Aed1222 lesson 4nurun2010
 
Introduction to the statistics project
Introduction to the statistics projectIntroduction to the statistics project
Introduction to the statistics projectpmakunja
 
Data visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemData visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemVivAde1
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSSPhi Jack
 
Bmgt 311 chapter_12
Bmgt 311 chapter_12Bmgt 311 chapter_12
Bmgt 311 chapter_12Chris Lovett
 
Data analysis &amp; interpretation
Data analysis &amp; interpretationData analysis &amp; interpretation
Data analysis &amp; interpretationavid
 
Slayter on planning quant design for flc projects - may 2011
Slayter   on planning quant design for flc projects - may 2011Slayter   on planning quant design for flc projects - may 2011
Slayter on planning quant design for flc projects - may 2011Elspeth Slayter
 
Lesson 10 rm psych stats & graphs 2013
Lesson 10   rm psych stats & graphs 2013Lesson 10   rm psych stats & graphs 2013
Lesson 10 rm psych stats & graphs 2013coburgpsych
 

What's hot (20)

Week4 Ensure Analysis Is Accurate And Complete
Week4 Ensure Analysis Is Accurate And CompleteWeek4 Ensure Analysis Is Accurate And Complete
Week4 Ensure Analysis Is Accurate And Complete
 
6 Modelling Purposes
6 Modelling Purposes6 Modelling Purposes
6 Modelling Purposes
 
Data analysis01 singlevariable
Data analysis01 singlevariableData analysis01 singlevariable
Data analysis01 singlevariable
 
Business Basic Statistics
Business Basic StatisticsBusiness Basic Statistics
Business Basic Statistics
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
 
ICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRI
ICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRIICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRI
ICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRI
 
Aed1222 lesson 4
Aed1222 lesson 4Aed1222 lesson 4
Aed1222 lesson 4
 
Introduction to the statistics project
Introduction to the statistics projectIntroduction to the statistics project
Introduction to the statistics project
 
Data visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemData visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problem
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSS
 
Bmgt 311 chapter_12
Bmgt 311 chapter_12Bmgt 311 chapter_12
Bmgt 311 chapter_12
 
Tqm new tools
Tqm new toolsTqm new tools
Tqm new tools
 
Data analysis &amp; interpretation
Data analysis &amp; interpretationData analysis &amp; interpretation
Data analysis &amp; interpretation
 
Using SPSS: A Tutorial
Using SPSS: A TutorialUsing SPSS: A Tutorial
Using SPSS: A Tutorial
 
Slayter on planning quant design for flc projects - may 2011
Slayter   on planning quant design for flc projects - may 2011Slayter   on planning quant design for flc projects - may 2011
Slayter on planning quant design for flc projects - may 2011
 
Decision tree
Decision treeDecision tree
Decision tree
 
Statistical Power
Statistical PowerStatistical Power
Statistical Power
 
Lesson 10 rm psych stats & graphs 2013
Lesson 10   rm psych stats & graphs 2013Lesson 10   rm psych stats & graphs 2013
Lesson 10 rm psych stats & graphs 2013
 

Similar to L8 scientific visualization of data

End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Analysing The Data
Analysing The DataAnalysing The Data
Analysing The DataAngel Evans
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0lee_anderson40
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisAashish Patel
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdfSugumarSarDurai
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Introduction - Using Stata
Introduction - Using StataIntroduction - Using Stata
Introduction - Using StataRyan Herzog
 

Similar to L8 scientific visualization of data (20)

Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Analysing The Data
Analysing The DataAnalysing The Data
Analysing The Data
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
Spss basics
Spss basicsSpss basics
Spss basics
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
CHAPTER 7.pptx
CHAPTER 7.pptxCHAPTER 7.pptx
CHAPTER 7.pptx
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Introduction - Using Stata
Introduction - Using StataIntroduction - Using Stata
Introduction - Using Stata
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 

More from Seppo Karrila

L5 format and substance of thesis
L5 format and substance of thesisL5 format and substance of thesis
L5 format and substance of thesisSeppo Karrila
 
L4 research proposal
L4 research proposalL4 research proposal
L4 research proposalSeppo Karrila
 
L3 hypothesis or research question
L3 hypothesis or research questionL3 hypothesis or research question
L3 hypothesis or research questionSeppo Karrila
 
How to run a meeting
How to run a meetingHow to run a meeting
How to run a meetingSeppo Karrila
 
On practical philosophy of research in science and technology
On practical philosophy of research in science and technologyOn practical philosophy of research in science and technology
On practical philosophy of research in science and technologySeppo Karrila
 
Lecture3 elementary optimization
Lecture3 elementary optimizationLecture3 elementary optimization
Lecture3 elementary optimizationSeppo Karrila
 
Scale-up and scale-down of chemical processes
Scale-up and scale-down of chemical processesScale-up and scale-down of chemical processes
Scale-up and scale-down of chemical processesSeppo Karrila
 
About your graduate studies part 2
About your graduate studies part 2About your graduate studies part 2
About your graduate studies part 2Seppo Karrila
 
About your graduate studies part 1
About your graduate studies part 1About your graduate studies part 1
About your graduate studies part 1Seppo Karrila
 
Projects, promotions, and the Peter principle
Projects, promotions, and the Peter principleProjects, promotions, and the Peter principle
Projects, promotions, and the Peter principleSeppo Karrila
 
Selecting experimental variables for response surface modeling
Selecting experimental variables for response surface modelingSelecting experimental variables for response surface modeling
Selecting experimental variables for response surface modelingSeppo Karrila
 
How to review a journal paper and prepare oral presentation
How to review a journal paper and prepare oral presentationHow to review a journal paper and prepare oral presentation
How to review a journal paper and prepare oral presentationSeppo Karrila
 

More from Seppo Karrila (12)

L5 format and substance of thesis
L5 format and substance of thesisL5 format and substance of thesis
L5 format and substance of thesis
 
L4 research proposal
L4 research proposalL4 research proposal
L4 research proposal
 
L3 hypothesis or research question
L3 hypothesis or research questionL3 hypothesis or research question
L3 hypothesis or research question
 
How to run a meeting
How to run a meetingHow to run a meeting
How to run a meeting
 
On practical philosophy of research in science and technology
On practical philosophy of research in science and technologyOn practical philosophy of research in science and technology
On practical philosophy of research in science and technology
 
Lecture3 elementary optimization
Lecture3 elementary optimizationLecture3 elementary optimization
Lecture3 elementary optimization
 
Scale-up and scale-down of chemical processes
Scale-up and scale-down of chemical processesScale-up and scale-down of chemical processes
Scale-up and scale-down of chemical processes
 
About your graduate studies part 2
About your graduate studies part 2About your graduate studies part 2
About your graduate studies part 2
 
About your graduate studies part 1
About your graduate studies part 1About your graduate studies part 1
About your graduate studies part 1
 
Projects, promotions, and the Peter principle
Projects, promotions, and the Peter principleProjects, promotions, and the Peter principle
Projects, promotions, and the Peter principle
 
Selecting experimental variables for response surface modeling
Selecting experimental variables for response surface modelingSelecting experimental variables for response surface modeling
Selecting experimental variables for response surface modeling
 
How to review a journal paper and prepare oral presentation
How to review a journal paper and prepare oral presentationHow to review a journal paper and prepare oral presentation
How to review a journal paper and prepare oral presentation
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

L8 scientific visualization of data

  • 1. Scientific visualization of data Research Methodology Seppo Karrila September 2017 (2560 Thai)
  • 2. Executive summary • This is about visualizing usually smallish sets of experimental data from lab, for use in science – This is NOT about beautiful impressive artistic and emotionally enticing “infographics” to average consumers • Emphasis is on insights, creating or corroborating hypotheses, or assessing research questions
  • 3. The purpose of visualization • Sight is your most important sense – One look at an image provides lots of information very rapidly – a picture is worth a thousand words – You are inherently good at detecting patterns visually • Seeing an “outlier” in a table is very difficult, it is often easy to see in a graph • The real purpose is “insights”, getting higher level summary that may be useful
  • 4. Fun quotes to know • The purpose of computing is insight, not numbers. (Richard Hamming) • The purpose of the numbers computed is not yet in sight. (unknown computer simulation expert)
  • 5. What kinds of insights are useful? • Confirmatory (often “corroborating” would be a better word) – You already have a hypothesis, or expectation of a pattern, now you look for corroborating evidence in NEW data • Creating new hypotheses – Observing patterns that serve as “research questions”: will this pattern continue or repeat, is it present in other datasets (from another time, another experiment, or another region) – You CAN’T CONFIRM A HYPOTHESIS FROM THE SAME DATA THAT GAVE IT • This is a problem for example with climate science, since we only have one history. We can only possibly confirm, with last 10 years of data, a hypothesis that was made from similar data at least ten years ago. And for “climate” ten years is too short anyway… • So if I create a hypothesis from available data and publish it, then you go “confirm it” in the same data, this is foundationally absolute nonsense. • Recall that predictive models need three data sets: training, validation, and test sets.
  • 6. More about confirmatory evidence • The only way to prove causation: – You can turn factor A on/off. Every time you turn it on, soon after B happens. This convinces you that “A on” causes B. Note that this requires experiments with manipulation of A! • Observational data (no manipulated variables) cannot prove causation – It can disprove it though. If B happened first, then it was not caused by A. This is the foundation of “Granger causality”. – If you are observing a game between intelligent opponents, then B could anticipate moves by A and try to counter them ahead of time. So the future opportunities of A can cause anticipating reactions by B. This is not how natural phenomena work, so Granger should be OK for us. • In any case, a game should not be modeled by simple one-time-step rules, but natural phenomena mostly obey such difference or differential models. In other words, the simple concept of “causation” is not appropriate for games between intelligent players.
  • 7. So keep in mind… • Statistical tests that claim to prove A affects B usually prove no such thing at all. The causation comes from your understanding of the world, and statistics helps you convince others… – It is a bit of black magic there – For example, the p-value is this: • Assuming given statistical distributions (often Gaussian normal) and that your null hypothesis is correct, the probability that the chosen statistic summarizing your observed data could be more extreme than what it is. • It takes a lot of magic to now say: small p clearly means causation. Statistics only shows correlations… it can’t tell if A caused B or the other way around, or if C influenced both A and B
  • 8. So let’s get to visualizations… • In Excel, Home > Conditional formatting allows seeing the numbers in a table. • You can also play with Insert > Sparklines which allows making tiny graphs within cells • Easy to spot smallest and largest, get some impression of distribution Random numbers 0.731033584 0.806055053 0.988103245 0.809884417 0.756027069 0.462190297 0.910670142 0.906566945 0.501780587 0.181802984 0.659130022 0.32821301 0.111329819 0.617390297 0.252291447 0.990308253 0.274208995 0.614407383 0.298483381 0.526614001 0.01251721
  • 9. Scatter plots in Excel • Illustration of Simpson’s paradox (from Wikipedia) – Ignoring a factor can give completely wrong trend • Seppo’s paradox – One single failed experiment can give high R2
  • 10. Trouble with Excel • Even making a plot showing Simpson’s paradox is difficult, Excel does not allow to format the markers by some factor… • However, most people can manipulate data in Excel, do some basic transformation, delete an outlier that would spoil the analysis (i.e., a failed experiment) – Statisticians can make up theories and criteria for what is an outlier. For an experimentalist, if you trust the experiment, then it can be an interesting special case… What counts is whether the data is real or corrupted. So one persons outlier can be the important special case for another. • Remember to keep your raw data safe. Do the analysis, including deleting outliers, in a separate file, preferably in a separate folder altogether !
  • 11. Pivot tables and charts • These are excellent for inspecting effects of multiple factors, especially when each factor only has two or three levels • Note: often you want to “paste special” choosing “values”, maybe also “transpose” – Copying formulas instead of values can be trouble – Next page has a data table, explore it in Excel…
  • 12. Data for pivoting Starch Preproc Temp C A B E F awcrit minerr maxerr Mrcrit cassava pregel 25 0.968875 -7.19313 4.136666 11.49491 4.369757 0.594448 -0.1752 0.267812 11.20288 cassava pregel 35 0.961012 -6.59564 4.80928 12.37645 3.90624 0.560223 -0.11034 0.200768 10.83981 cassava raw 25 1.084315 -9.35922 6.394428 11.54028 6.14054 0.584587 -0.15701 0.234026 12.88684 cassava raw 35 1.053811 -8.16377 6.670096 12.89617 5.238465 0.567919 -0.19874 0.207021 12.56244 mix pregel 25 0.970572 -7.23715 4.387437 12.13923 4.860418 0.661974 -0.18178 0.335665 12.89627 mix pregel 35 0.956754 -6.33248 5.169849 13.17968 3.939175 0.628266 -0.1169 0.189642 12.21953 mix raw 25 1.011224 -6.82134 7.067797 13.1484 5.482373 0.652614 -0.07943 0.114265 14.0632 mix raw 35 1.03495 -7.5668 6.548293 13.77577 4.80145 0.647217 -0.13632 0.238465 13.71736 rice pregel 25 0.976289 -7.87222 3.483088 11.59115 4.755083 0.649166 -0.13634 0.280165 12.27966 rice pregel 35 0.963513 -7.03764 4.010848 12.02805 3.972963 0.607599 -0.2066 0.231581 11.28119 rice raw 25 1.00602 -6.96414 7.248538 13.4071 5.88229 0.672905 -0.18423 0.139877 14.904 rice raw 35 1.023475 -7.41054 6.808834 13.77478 5.185373 0.655075 -0.20526 0.113226 14.20889
  • 13. Reproduce this pivot chart… • Note that you can sort the “axis fields”, and this affects the grouping – You can select a primary comparison
  • 14. How about fitting a model? • There are very basic model options ready-made as trendlines in Excel • What you really have to do typically is this: – Your inputs and targeted model output y are in columns – You guess starting values for model parameters, calculate model output y~ with these – For every data point you form squared error (y-y~)2 – Sum the column of squared errors, then minimize the sum by using Data > Solver, which adjusts the model parameters
  • 15. Note about the basic solver • There is a better option freely available for download, search for DirectOptimizer (you need to install it as add-in) – It comes with a small manual that helps you get started • The point – If you need to fit Arrhenius law, or whatever other model from physics or physical chemistry, then you pretty much have to do “nonlinear least squares” fitting • Even if there is a “linearizing transformation” the error sum gets also transformed, and the results can be poor because of this – much of the time you can do this in Excel…
  • 16. Free statistics packages • Check out JASP or JAMOVI – The two are very similar, JASP has some special Bayesian statistics that are unconventional – Note again that while people think of Bayesian probability as causation, NO statistical test actually proves anything about causality! (Bayesian networks are sometimes called “causal networks”, which sounds good but is absolutely misleading. JASP doesn’t do them though.) • JAMOVI current version is 0.8.0.5 – It appears to get more frequent updates than JASP
  • 17. Hands-on exploration of JAMOVI • Basic functionality for – Importing data – Adjusting metadata on variables (type, levels) – Inspecting basic statistics – Plotting the correlation matrix • Note – You can’t get a matrix scatter plot of multiple variables from Excel…
  • 18. Iris data in JAMOVI • It is easy to generate fancy plots of how the data are distributed. • However, you can’t create classifiers in JAMOVI…
  • 19. Significances of correlations Correlation Matrix Sepal.Leng th Sepal.Wid th Petal.Leng th Petal.Wid th Sepal.Length — -0.118 0.872 *** 0.818 *** Sepal.Width — -0.428 *** -0.366 *** Petal.Length — 0.963 *** Petal.Width — Note. * p < .05, ** p < .01, *** p < .001 Note: copy/paste to Word works well, not so well from JAMOVI to PowerPoint. I used OneNote as intermediate to get this into PowerPoint… Less than perfect.
  • 20. The point of correlations? • IF some variable is assumed causal, then the trends of effects are important – B increases or decreases with the manipulated variable A • If two independently measured variables have a high correlation, then neither is badly corrupted by noise – Correlation indicates there is mutual information, a variable that carries no information about anything else might as well be noise
  • 22. A first look at DataWarrior • Current version 4.6.1 from Openmolecules.org • Even if you run 64 bit Windows, take the 32 bit version – it can handle large enough data sets • This is a freely available professional quality software package – Too many features to cover… several tutorials are available on YouTube
  • 23. Iris data again • I selected all columns in data view of JAMOVI, copied, pasted to Excel, put back column labels • Then did “paste special” with headers to get into DataWarrior
  • 24. • In DataWarrior it is easy to assign marker color, size, etc., to a feature or variable, so one plot can display multiple dimensions.
  • 25. • 3-D scatterplots are easy to make a manipulate also…
  • 26. What I encourage is this… • Get yourself free software – Then learning to use it is a safe investment, because you are not cut off by fees or licensing • The first thing to do with new data is to look at it. Let the data guide you more than your own prior assumptions. – It is big effects that are important, you should be able to see them – Statistically significant differences almost always emerge if you just collect enough samples – checking significances is to a large part a ritual without much meaning for practice
  • 27. Conclusions • Most people are handy with Excel and use it to collect and manipulate data – It has some ability for visualization, but very limited. See how far it can take you… maybe it is enough – It is good for transforming data by calculating new columns • There is now free software for some basic exploratory plotting and statistics – JASP and JAMOVI appear convenient for a non-statistician • For industry-strength visualizations DataWarrior is a free desktop application • None of the above is for learning classifiers or for doing nonlinear regression… but you can do basic nonlinear regression easily in Excel, with some manual labor – Get DirectOptimizer add-in, at no cost