SlideShare a Scribd company logo
1 of 38
Download to read offline
(AN INTRODUCTION)
REGRESSION ANALYSIS
FOR DATA-JOURNALISM
Camila Salazar
School of Data Fellow
@milamila07
Outline
1. Target audience
2. A step beyond descriptive statistics
3. What is regression analysis?
4. Example: the effect of education on wages
5. Other types of regression analysis useful in data
journalism.
6. Using regression models in data journalism
TARGET AUDIENCE
Target audience
• Data journalists
• School of Data Fellows
• People with basic knowledge of statistics
• Journalism students
A STEP BEYOND
DESCRIPTIVE STATISTICS
So you are in the newsroom...
There’s a big debate in you country about the importance of education
Your editor asks you to make a story about the importance of education
First step: descriptive statistics
You find data about education in your country and start
calculating the descriptive statistics.
Descriptive statistics
With descriptive statistics you find:
-How many people has a college degree.
-Unemployment according to the level of education.
And...
You interview young people that are still in highschool
that don’t want to go to college. And you want to
convince them with your story how could they
improve their future earnings if they go to college.
You can’t answer this question using descriptive
statistics :(
But...
You can calculate how much an extra year of schooling
increases wages using regression analysis!
WHAT IS REGRESSION
ANALYSIS?
What is regression analysis?
Regression analysis is a statistical tool for the
investigation of relationships between variables.
What is regression analysis?
It helps you explain how the value of a dependent
variable (Y) changes when and independent variable
(X) is varied, holding all other variables fixed.
What is regression analysis?
For example:
Health (Y)
Vegetables consumption (X), exercise (X), sleep (X)
dependent variable
independent variables
The linear regression
It’s a method for modeling the linear relationship between a
dependent variable Y and one or more explanatory variables.
dependent
variable independent
variable
error term
coefficient
We are interested in
estimating B (the
coefficient). It captures
the effect X has on Y,
holding all other factors
fixed.
The linear regression
For example you want to explain the effect of education on
wages.
Wage EducationExperience
Variation in wage that has
to do with educationVariation in wage that has
to do with experience
What is a linear regression?
• You have to formulate a hypothesis about the
relationships of interest.
• Have some theory behind your assumptions.
• There are some essential assumptions and
statistical properties of the regression that you
have to consider. Wage
EXAMPLE: THE EFFECT
OF EDUCATION ON
WAGES
Example
• Database with 994 observations.
• 3 variables: wage (in dollars), experience, years of
education.
• The equation to estimate:
Wage
Example
Wage
Example: coefficients
Wage
An additional year of education increases
wage by $161.68, holding all other factors
fixed.
An additional year of experience increases
wage by $16.54, holding all other factors
fixed.
Example: p-value
Wage
P-Value
But, what is the p-value?
Example: p-value
Wage
With statistics you can’t be 100% certain.
A relatively simple way to interpret P values is
to think of them as representing how likely a
result would occur by chance.
Example: p-value
Wage
Null-hypothesis: is a hypothesis which the researcher tries to
disprove, reject or nullify.
“Education has NO explanatory power over wages”
“Men are NOT taller than women on average”
To test the null-hypothesis we use the p-value.
Example: p-value
Wage
The p-value is the probability of being wrong when rejecting
the null hypothesis
If your p-value is small < 0.05 you have strong evidence to
reject the null hypothesis.
“Men are significantly taller than women, p=0.01.” That means there is a 1%
chance that men are NOT actually taller than women and this result happened
only because of random chance.
Example
Wage
P-Value
It tells you if the coefficient is statistically significant.
With a low p-value (less than 10%, 5% or 1%) you can reject the null hypothesis
that the coefficient is equal to zero (it has no explanatory power). In this case,
the coefficients are significant. That means that education and experience have
explanatory power on wage.
Example
Wage
R-squared: This indicates
how well the explanatory
variables explain the
variability of the
dependent variable.
In this case: 33.8% of the variability of wage is
explained by the years of education and years of
experience.
OTHER TYPES OF
REGRESSION ANALYSIS
The logistic regression
Wage
Imagine you want to estimate the probability that a
person with a college degree is employed.
The linear regression wouldn’t be very useful.
The logistic regression
Wage
Is a regression model where the dependent variable (Y) is
categorical. For example (binary):
1= unemployed, 0= employed
It is used to estimate the probability of a binary response based
on one or more independent variables.
The logistic regression
Wage
Explanatory variables:
-Age
-Education
-Family income
-Ocuppation
Logistic
regression
Employed
Unemployed
The model would tell you, for example, that a person with a college degree is three times
more likely to be employed that a person that only went to highschool.
The logistic regression
Wage
• The coefficients can not be interpreted as the rate
of change in the dependent variable.
• You check the sign of the coefficients.
• You can calculate marginal effects or odds ratio
(logit).
USING REGRESSION
MODELS IN DATA
JOURNALISM
Some examples
"Does School Pay Off? How Much?" - El Financiero (Costa Rica),
winner of the Data Journalism Awards 2014.
http://www.elfinancierocr.
com/gnfactory/especiales/2015/calculadorasalarial/
Wage
Some examples
“Presidential Pardons Heavily Favor Whites” - ProPublica
http://www.propublica.org/article/shades-of-mercy-
presidential-forgiveness-heavily-favors-whites
Methodology: http://www.propublica.org/article/how-
propublica-analyzed-pardon-dataWage
Some advice
• Statistical analysis can be complex. If you’re not
sure find advice with an expert!
• Be transparent with your methodology.
• Study a lot!
• https://www.coursera.org/ Free courses!
Wage
References
-Wooldridge (2010). Introductory Econometrics
-Long (1997). Regression models for categorical and
limited dependent variables
-Costa Rica National Survey of Income and Spending
(2004).
Wage
THANKS :)
@milamila07
schoolofdata.org

More Related Content

Similar to Skillshare - Regression Analysis for Data Journalism

Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1gueste87a4f
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelorminiCarmineGelormini
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShiraz316
 
Between Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docxBetween Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docxjasoninnes20
 
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptQUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptBhawna173140
 
Module 1 introduction to statistics
Module 1 introduction to statisticsModule 1 introduction to statistics
Module 1 introduction to statisticsChristine Concordia
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Minal Jadeja
 
Presentation4 (2) (1) (1).pptx
Presentation4 (2) (1) (1).pptxPresentation4 (2) (1) (1).pptx
Presentation4 (2) (1) (1).pptxHeidiPalomoLopez
 
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docxrhetttrevannion
 
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docx
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docxPage 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docx
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docxkarlhennesey
 
The Principle of Graphing
The Principle of GraphingThe Principle of Graphing
The Principle of GraphingLumen Learning
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Minal Jadeja
 
The Principles of Graphing
The Principles of GraphingThe Principles of Graphing
The Principles of GraphingLumen Learning
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelingstone55
 
Data AnalysisResearch Report AssessmentBSB
Data AnalysisResearch Report AssessmentBSBData AnalysisResearch Report AssessmentBSB
Data AnalysisResearch Report AssessmentBSBOllieShoresna
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 

Similar to Skillshare - Regression Analysis for Data Journalism (20)

Stat11t chapter1
Stat11t chapter1Stat11t chapter1
Stat11t chapter1
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelormini
 
Correlational Study
Correlational StudyCorrelational Study
Correlational Study
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Seawell_Exam
Seawell_ExamSeawell_Exam
Seawell_Exam
 
Between Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docxBetween Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docx
 
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptQUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
 
Module 1 introduction to statistics
Module 1 introduction to statisticsModule 1 introduction to statistics
Module 1 introduction to statistics
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1
 
Presentation4 (2) (1) (1).pptx
Presentation4 (2) (1) (1).pptxPresentation4 (2) (1) (1).pptx
Presentation4 (2) (1) (1).pptx
 
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
 
Day 1_ Introduction.pptx
Day 1_ Introduction.pptxDay 1_ Introduction.pptx
Day 1_ Introduction.pptx
 
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docx
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docxPage 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docx
Page 2 of 5 MG 620 Term Project and Grading RubricsSPRING 2.docx
 
The Principle of Graphing
The Principle of GraphingThe Principle of Graphing
The Principle of Graphing
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1
 
The Principles of Graphing
The Principles of GraphingThe Principles of Graphing
The Principles of Graphing
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
 
Data AnalysisResearch Report AssessmentBSB
Data AnalysisResearch Report AssessmentBSBData AnalysisResearch Report AssessmentBSB
Data AnalysisResearch Report AssessmentBSB
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 

More from School of Data

School of Data - What is it?
School of Data - What is it?School of Data - What is it?
School of Data - What is it?School of Data
 
Skillshare - Understanding extractives data
Skillshare - Understanding extractives dataSkillshare - Understanding extractives data
Skillshare - Understanding extractives dataSchool of Data
 
Skillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in NigeriaSkillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in NigeriaSchool of Data
 
Skillshare - Using Kobo Toolbox for mobile data collection
Skillshare - Using Kobo Toolbox for mobile data collectionSkillshare - Using Kobo Toolbox for mobile data collection
Skillshare - Using Kobo Toolbox for mobile data collectionSchool of Data
 
Skillshare - Introduction to Timemapper
Skillshare - Introduction to TimemapperSkillshare - Introduction to Timemapper
Skillshare - Introduction to TimemapperSchool of Data
 
Skillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSkillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSchool of Data
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSchool of Data
 
From data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and chartsFrom data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and chartsSchool of Data
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data JournalismSchool of Data
 
Skillshare getting feedback from training events
Skillshare  getting feedback from training events Skillshare  getting feedback from training events
Skillshare getting feedback from training events School of Data
 
Activism through the lens [english].pptx
Activism through the lens [english].pptxActivism through the lens [english].pptx
Activism through the lens [english].pptxSchool of Data
 
Gamification skillshare by Yuandra Ismiraldi
Gamification skillshare by Yuandra IsmiraldiGamification skillshare by Yuandra Ismiraldi
Gamification skillshare by Yuandra IsmiraldiSchool of Data
 
Facilitation skill share by Happy Feraren
Facilitation skill share by Happy FerarenFacilitation skill share by Happy Feraren
Facilitation skill share by Happy FerarenSchool of Data
 
Mapping Skillshare with School of Data
Mapping Skillshare with School of DataMapping Skillshare with School of Data
Mapping Skillshare with School of DataSchool of Data
 
Data Visualization & Design with School of Data
Data Visualization & Design with School of DataData Visualization & Design with School of Data
Data Visualization & Design with School of DataSchool of Data
 
Network mapping with School of Data
Network mapping with School of DataNetwork mapping with School of Data
Network mapping with School of DataSchool of Data
 

More from School of Data (19)

School of Data - What is it?
School of Data - What is it?School of Data - What is it?
School of Data - What is it?
 
Skillshare - Understanding extractives data
Skillshare - Understanding extractives dataSkillshare - Understanding extractives data
Skillshare - Understanding extractives data
 
Skillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in NigeriaSkillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in Nigeria
 
Skillshare - Using Kobo Toolbox for mobile data collection
Skillshare - Using Kobo Toolbox for mobile data collectionSkillshare - Using Kobo Toolbox for mobile data collection
Skillshare - Using Kobo Toolbox for mobile data collection
 
Skillshare - Introduction to Timemapper
Skillshare - Introduction to TimemapperSkillshare - Introduction to Timemapper
Skillshare - Introduction to Timemapper
 
Skillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSkillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data Journalism
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
 
Intro to open refine
Intro to open refineIntro to open refine
Intro to open refine
 
From data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and chartsFrom data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and charts
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data Journalism
 
Skillshare getting feedback from training events
Skillshare  getting feedback from training events Skillshare  getting feedback from training events
Skillshare getting feedback from training events
 
Photography tips
Photography tipsPhotography tips
Photography tips
 
Activism through the lens [english].pptx
Activism through the lens [english].pptxActivism through the lens [english].pptx
Activism through the lens [english].pptx
 
Gamification skillshare by Yuandra Ismiraldi
Gamification skillshare by Yuandra IsmiraldiGamification skillshare by Yuandra Ismiraldi
Gamification skillshare by Yuandra Ismiraldi
 
Facilitation skill share by Happy Feraren
Facilitation skill share by Happy FerarenFacilitation skill share by Happy Feraren
Facilitation skill share by Happy Feraren
 
UX presentation
UX presentationUX presentation
UX presentation
 
Mapping Skillshare with School of Data
Mapping Skillshare with School of DataMapping Skillshare with School of Data
Mapping Skillshare with School of Data
 
Data Visualization & Design with School of Data
Data Visualization & Design with School of DataData Visualization & Design with School of Data
Data Visualization & Design with School of Data
 
Network mapping with School of Data
Network mapping with School of DataNetwork mapping with School of Data
Network mapping with School of Data
 

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Recently uploaded (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

Skillshare - Regression Analysis for Data Journalism

  • 1. (AN INTRODUCTION) REGRESSION ANALYSIS FOR DATA-JOURNALISM Camila Salazar School of Data Fellow @milamila07
  • 2. Outline 1. Target audience 2. A step beyond descriptive statistics 3. What is regression analysis? 4. Example: the effect of education on wages 5. Other types of regression analysis useful in data journalism. 6. Using regression models in data journalism
  • 4. Target audience • Data journalists • School of Data Fellows • People with basic knowledge of statistics • Journalism students
  • 6. So you are in the newsroom... There’s a big debate in you country about the importance of education Your editor asks you to make a story about the importance of education
  • 7. First step: descriptive statistics You find data about education in your country and start calculating the descriptive statistics.
  • 8. Descriptive statistics With descriptive statistics you find: -How many people has a college degree. -Unemployment according to the level of education.
  • 9. And... You interview young people that are still in highschool that don’t want to go to college. And you want to convince them with your story how could they improve their future earnings if they go to college. You can’t answer this question using descriptive statistics :(
  • 10. But... You can calculate how much an extra year of schooling increases wages using regression analysis!
  • 12. What is regression analysis? Regression analysis is a statistical tool for the investigation of relationships between variables.
  • 13. What is regression analysis? It helps you explain how the value of a dependent variable (Y) changes when and independent variable (X) is varied, holding all other variables fixed.
  • 14. What is regression analysis? For example: Health (Y) Vegetables consumption (X), exercise (X), sleep (X) dependent variable independent variables
  • 15. The linear regression It’s a method for modeling the linear relationship between a dependent variable Y and one or more explanatory variables. dependent variable independent variable error term coefficient We are interested in estimating B (the coefficient). It captures the effect X has on Y, holding all other factors fixed.
  • 16. The linear regression For example you want to explain the effect of education on wages. Wage EducationExperience Variation in wage that has to do with educationVariation in wage that has to do with experience
  • 17. What is a linear regression? • You have to formulate a hypothesis about the relationships of interest. • Have some theory behind your assumptions. • There are some essential assumptions and statistical properties of the regression that you have to consider. Wage
  • 18. EXAMPLE: THE EFFECT OF EDUCATION ON WAGES
  • 19. Example • Database with 994 observations. • 3 variables: wage (in dollars), experience, years of education. • The equation to estimate: Wage
  • 21. Example: coefficients Wage An additional year of education increases wage by $161.68, holding all other factors fixed. An additional year of experience increases wage by $16.54, holding all other factors fixed.
  • 23. Example: p-value Wage With statistics you can’t be 100% certain. A relatively simple way to interpret P values is to think of them as representing how likely a result would occur by chance.
  • 24. Example: p-value Wage Null-hypothesis: is a hypothesis which the researcher tries to disprove, reject or nullify. “Education has NO explanatory power over wages” “Men are NOT taller than women on average” To test the null-hypothesis we use the p-value.
  • 25. Example: p-value Wage The p-value is the probability of being wrong when rejecting the null hypothesis If your p-value is small < 0.05 you have strong evidence to reject the null hypothesis. “Men are significantly taller than women, p=0.01.” That means there is a 1% chance that men are NOT actually taller than women and this result happened only because of random chance.
  • 26. Example Wage P-Value It tells you if the coefficient is statistically significant. With a low p-value (less than 10%, 5% or 1%) you can reject the null hypothesis that the coefficient is equal to zero (it has no explanatory power). In this case, the coefficients are significant. That means that education and experience have explanatory power on wage.
  • 27. Example Wage R-squared: This indicates how well the explanatory variables explain the variability of the dependent variable. In this case: 33.8% of the variability of wage is explained by the years of education and years of experience.
  • 29. The logistic regression Wage Imagine you want to estimate the probability that a person with a college degree is employed. The linear regression wouldn’t be very useful.
  • 30. The logistic regression Wage Is a regression model where the dependent variable (Y) is categorical. For example (binary): 1= unemployed, 0= employed It is used to estimate the probability of a binary response based on one or more independent variables.
  • 31. The logistic regression Wage Explanatory variables: -Age -Education -Family income -Ocuppation Logistic regression Employed Unemployed The model would tell you, for example, that a person with a college degree is three times more likely to be employed that a person that only went to highschool.
  • 32. The logistic regression Wage • The coefficients can not be interpreted as the rate of change in the dependent variable. • You check the sign of the coefficients. • You can calculate marginal effects or odds ratio (logit).
  • 33. USING REGRESSION MODELS IN DATA JOURNALISM
  • 34. Some examples "Does School Pay Off? How Much?" - El Financiero (Costa Rica), winner of the Data Journalism Awards 2014. http://www.elfinancierocr. com/gnfactory/especiales/2015/calculadorasalarial/ Wage
  • 35. Some examples “Presidential Pardons Heavily Favor Whites” - ProPublica http://www.propublica.org/article/shades-of-mercy- presidential-forgiveness-heavily-favors-whites Methodology: http://www.propublica.org/article/how- propublica-analyzed-pardon-dataWage
  • 36. Some advice • Statistical analysis can be complex. If you’re not sure find advice with an expert! • Be transparent with your methodology. • Study a lot! • https://www.coursera.org/ Free courses! Wage
  • 37. References -Wooldridge (2010). Introductory Econometrics -Long (1997). Regression models for categorical and limited dependent variables -Costa Rica National Survey of Income and Spending (2004). Wage