SlideShare a Scribd company logo
1 of 16
An Application of
Ordinary Least Square
Regression & Stage-2
Regression to Remove
Endogeneity Issues in
Casual Inference
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
FACTORS INFLUENCING
MEDICAL EXPENSES
• Endogeneity Issues in Casual
Inference
• Ordinary Least Square (OLS)
Regression & Stage-2
Regression
• Relationship Between OLS
Regression/Stage-2
Regression and Difference in
Difference and Interaction
Term
• Project’s Primary goals
• Context
• Dataset & Modelling
Strategies
• Findings & Conclusions
PRESENTATION TITLE 2
AGENDA
ENDOGENEITY ISSUES IN CASUAL INFERENCE
3
In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is
itself influenced by the outcome variable (dependent variable) or other unobserved factors
Reverse Causality
A situation where the independent
variable is influenced by the
dependent variable, making it
impossible to tell which one truly
causes the other without further
analysis
Unobserved Factors
These are like hidden players in
the causal game. They influence
both the independent and
dependent variables, but you
can't directly measure them.
These create a tangled web of relationships that makes it difficult to isolate the true causal effect of the
independent variable on the dependent variable
An Application of Ordinary Least Square Regression & Stage-2 Regression
OLS REGRESSION & STAGE-2 REGRESSION
4
In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is
itself influenced by the outcome variable (dependent variable) or other unobserved factors; the independent
variable in a regression is correlated with the error term
Ordinary Least Squares
(OLS) Regression
A general purpose statistical method
used to estimate the linear relationship
between a dependent variable and one
or more independent variables: fits a
straight line to the data points to
minimise the sum of the squared residuals
(the vertical distances between the data
points and the regression line)
Stage-2 Regression
A specific statistical technique, often used in
instrumental variable (IV) regression, deployed to
address endogeneity issues
An Application of Ordinary Least Square Regression & Stage-2 Regression
First stage
An instrument variable
(correlated with the endogenous
independent variable but not with
the error term) is used to predict
the endogenous variable
Second stage
The predicted values from the
first stage are used as an
independent variable in a
regression with the dependent
variable
STAGE-2 REGRESSION & DID – THE CONNECTIONS
5
Difference In Differences & Stage-2 Regression
are separate techniques used in causal inference
Difference In Differences (DID)
A research design & estimation technique
used to isolate the causal effect of a
treatment/policy intervention by
comparing changes over time between a
Treatment Group & a Control Group
Stage-2 Regression
2-stage regression is a statistical
technique used to address endogeneity
issues in regression models
An Application of Ordinary Least Square Regression & Stage-2 Regression
For example, apply DID to compare
the change in test scores for
programme participants before and
after the programme relative to the
change in test scores for non-
participants over the same period
Within the DID framework, use 2-stage
regression with an instrument variable
(e.g., distance to the programme) to
address this endogeneity and obtain
more reliable estimates of the
programme's true effect
Although distinct, DID and
2-stage regression can be
used sequentially in certain
situations
Recognise that self-selection
might still create
endogeneity issues
PROJECT’S PRIMARY GOALS
To i d e n t i f y f a c t o r s i n f l u e n c i n g m e d i c a l ex p e n s e s g i ve n
t h e va r i a b l e s w h i l e r e m o v i n g e n d o ge n e i t y i s s u e
An Application of Ordinary Least Square Regression & Stage-2 Regression
CONTEXT
Good health insurance is one that can cover a
maximum amount of medical expenses so that people
don't have to worry about paying medical bills
As a health insurance company, the company saw its
sales fall significantly over time, something that is
causing concerns
It is the firm's intention to analyse factors that
determine medical expenses in order to improve their
sales in the coming fiscal year
By conducting the study, they will have a better
understanding of their customers' needs and be able
to develop their marketing strategies accordingly
4/21/2024 PRESENTATION TITLE 7
An Application of Ordinary Least Square Regression & Stage-2 Regression
Dataset
OLS REGRESSION
Observed outcomes
from OLS regression
using independent
variables
1
STAGE-2 REGRESSION
• Observe outcomes from
Stage - 1 Regression with
the endogenous variable as
the target variable
• Observe Stage - 2
Regression using predicted
endogenous variable
2
INSIGHTS
Form insights from
results extracted out
of OLS regression
and Stage - 2
Regression
3
8
DATASET & MODELLING STRATEGIES
An Application of Ordinary Least Square Regression & Stage-2 Regression
9
SOCIAL SECURITY INCOME (SSI) RATIO
An Application of Ordinary Least Square Regression & Stage-2 Regression
Social Security Income is provided to Senior Citizens
SSI Ratio calculation is done by considering multiple parameters like Years of earning, AIME [Average indexed monthly earnings],
individual assets, and a number of dependencies
Considering the above parameters, the governing body will decide the ratio of SSI to be provided to the individuals
This final value has been provided in the dataset as ssiratio which can be used for further analysis directly)
10
OLS REGRESSION WITH INDEPENDENT VARIABLES
An Application of Ordinary Least Square Regression & Stage-2 Regression
Of the four control (also known as independent) variables, only 'illnesses' and 'healthinsu' have p-values below 0.05. These
are significant as their statistics suggest that their relationships with 'logmedexpense' are not an occurrence of chance nor a
random occurrence. So, an additional illness will raise medical expenses by 0.44 units while those with health insurance would
see their medical expenses increased by 0.07 units. Since these are independent variables, we assume that there is no
multicollinearity between these two variables. This would mean that as a patient has an additional of illness and at the same
time has a valid medical insurance, he/she would experience a total of 0.5156-unit increase to his/her medical expenses
11
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• In a Linear Regression Analysis, the residual is the
difference between the observed value and the
predicted value of the dependent variable
• For this regression, the residual value of 0.4544
means that the predicted value of this
observation is 0.4544 units less than the
observed value.
• In other words, the model under-predicted the
value of the dependent variable for this
observation by this amount
Stage - 1 Regression with the endogenous variable as target variable
12
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• Residuals are used to access how well the model fits the data. When the
residuals are randomly distributed around zero, it suggests that the
model is a good fit for the data
• However, the Histogram (referring left) for the residuals does not show
that the values are distributed around zero. In fact, the model mostly
over-predicted and under-predicted the value of the dependent
variable for the 10,089 observations; there are patterns in the residuals
which may suggest that the model is not a good fit for the data
• Conversely, the average predicted values for all 10,089 observations is
0.38, which is not closed to the observed values of the dependent
variable. This again suggests that the model is not a good fit for the
data
13
STAGE - 1 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• When the Stage – 1 Regression model is not a good fit for the data, it
means that the model is not accurately capturing the relationship
between the independent and dependent variables
• There are several possible reasons causing this, like omitted variables,
incorrect functional form, or invalid instrument. In such cases, the
estimates produced by the model may be not accurately reflect the true
relationship between the variables
• To improve the fit of the model, additional relevant variables should be
included , changing the functional form of the model, or using a
different instrument so that the first stage satisfy the condition of
relevance and exogeneity
• However, since there isn’t additional information provided in the project,
making improvement to the model is infeasible
14
STAGE - 2 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
The statistics suggests that an additional unit of illness and an additional unit of income would, respectively, increase
medical expenses by 0.449 unit and 0.098 unit. Conversely, an additional unit of age and people with health
insurance would, respectively, lower medical expenses by 0.012 unit and 0.852 unit. All these four independent
variables have P-values lesser than 0.05, which suggests that these are significant, and not occurrences of chance nor a
random occurrence
Stage - 2 Regression using predicted endogenous variable
15
INSIGHTS FROM OLS & STAGE - 2 REGRESSION
An Application of Ordinary Least Square Regression & Stage-2 Regression
• In the Stage – 1 analysis, the endogenous
variable is regressed to the Instrumental
Variable. At this stage, since the P-value for the
Instrumental Variable is less than 0.05, it
indicates that the Instrumental Variable is
significantly related to the endogenous variable
• This is known as the relevance condition for an
instrumental variable, which means that the
instrument is correlated with the endogenous
variable and can be used to predict it
• If the value of the F-Stat could be calculated,
using tools like R or Python, the strength and
weakness of the instrument could be further
determined
The Linear Regression results suggest that people with health insurance would
experience a 0.075-unit increase in medical expenses. While the 2-Stage results
suggest that people with health insurance would experience a 0.852-unit decrease
in medical expenses. SSI Ratio is associated with -0.1998 units of health insurance.
These two estimates seem to be heading in opposite directions, and endogeneity
problems is suspected
An Application of
Ordinary Least Square
Regression & Stage-2
Regression to Remove
Endogeneity Issues in
Casual Inference
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
FACTORS INFLUENCING
MEDICAL EXPENSES

More Related Content

Similar to Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses

Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
SuperCoder LLC
 
Add slides
Add slidesAdd slides
Add slides
Rupa D
 
USMLE CK SCORE.PDF
USMLE CK SCORE.PDFUSMLE CK SCORE.PDF
USMLE CK SCORE.PDF
Said Sarhan
 
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_homeChristie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Shane Newman
 
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the AssigModule 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
IlonaThornburg83
 

Similar to Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses (20)

200994363
200994363200994363
200994363
 
Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
 
OCI sensitvity to change
OCI sensitvity to changeOCI sensitvity to change
OCI sensitvity to change
 
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
 
Add slides
Add slidesAdd slides
Add slides
 
Cost Prediction of Health Insurance
Cost Prediction of Health InsuranceCost Prediction of Health Insurance
Cost Prediction of Health Insurance
 
USMLE CK SCORE.PDF
USMLE CK SCORE.PDFUSMLE CK SCORE.PDF
USMLE CK SCORE.PDF
 
Article
ArticleArticle
Article
 
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESSANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
ANALYSIS OF PATIENT WAITING TIME FOR HOSPITAL ADMISSION AND DISCHARGE PROCESS
 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyalty
 
Intensive Care Unit Scoring Systems
Intensive Care Unit Scoring SystemsIntensive Care Unit Scoring Systems
Intensive Care Unit Scoring Systems
 
Methodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologiesMethodologies for impact assessment of post harvest technologies
Methodologies for impact assessment of post harvest technologies
 
Poster for Sleep Final AARC
Poster for Sleep Final AARCPoster for Sleep Final AARC
Poster for Sleep Final AARC
 
Christie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_homeChristie tiegland state_veterans_homes_not_your_average_nursing_home
Christie tiegland state_veterans_homes_not_your_average_nursing_home
 
Correlation & Regression.pptx
Correlation & Regression.pptxCorrelation & Regression.pptx
Correlation & Regression.pptx
 
Determining Condition Monitoring
Determining Condition MonitoringDetermining Condition Monitoring
Determining Condition Monitoring
 
Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...Assessing the costs and effects of anti-retroviral therapy task shifting from...
Assessing the costs and effects of anti-retroviral therapy task shifting from...
 
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the AssigModule 08 Assignment – Nursing InterventionsPurpose of the Assig
Module 08 Assignment – Nursing InterventionsPurpose of the Assig
 
David Dranove
David DranoveDavid Dranove
David Dranove
 

More from ThinkInnovation

Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
ThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
ThinkInnovation
 

More from ThinkInnovation (17)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing Medical Expenses

  • 1. An Application of Ordinary Least Square Regression & Stage-2 Regression to Remove Endogeneity Issues in Casual Inference Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com FACTORS INFLUENCING MEDICAL EXPENSES
  • 2. • Endogeneity Issues in Casual Inference • Ordinary Least Square (OLS) Regression & Stage-2 Regression • Relationship Between OLS Regression/Stage-2 Regression and Difference in Difference and Interaction Term • Project’s Primary goals • Context • Dataset & Modelling Strategies • Findings & Conclusions PRESENTATION TITLE 2 AGENDA
  • 3. ENDOGENEITY ISSUES IN CASUAL INFERENCE 3 In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is itself influenced by the outcome variable (dependent variable) or other unobserved factors Reverse Causality A situation where the independent variable is influenced by the dependent variable, making it impossible to tell which one truly causes the other without further analysis Unobserved Factors These are like hidden players in the causal game. They influence both the independent and dependent variables, but you can't directly measure them. These create a tangled web of relationships that makes it difficult to isolate the true causal effect of the independent variable on the dependent variable An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 4. OLS REGRESSION & STAGE-2 REGRESSION 4 In casual inference, endogeneity issues arise when the variable that is causing an effect (independent variable) is itself influenced by the outcome variable (dependent variable) or other unobserved factors; the independent variable in a regression is correlated with the error term Ordinary Least Squares (OLS) Regression A general purpose statistical method used to estimate the linear relationship between a dependent variable and one or more independent variables: fits a straight line to the data points to minimise the sum of the squared residuals (the vertical distances between the data points and the regression line) Stage-2 Regression A specific statistical technique, often used in instrumental variable (IV) regression, deployed to address endogeneity issues An Application of Ordinary Least Square Regression & Stage-2 Regression First stage An instrument variable (correlated with the endogenous independent variable but not with the error term) is used to predict the endogenous variable Second stage The predicted values from the first stage are used as an independent variable in a regression with the dependent variable
  • 5. STAGE-2 REGRESSION & DID – THE CONNECTIONS 5 Difference In Differences & Stage-2 Regression are separate techniques used in causal inference Difference In Differences (DID) A research design & estimation technique used to isolate the causal effect of a treatment/policy intervention by comparing changes over time between a Treatment Group & a Control Group Stage-2 Regression 2-stage regression is a statistical technique used to address endogeneity issues in regression models An Application of Ordinary Least Square Regression & Stage-2 Regression For example, apply DID to compare the change in test scores for programme participants before and after the programme relative to the change in test scores for non- participants over the same period Within the DID framework, use 2-stage regression with an instrument variable (e.g., distance to the programme) to address this endogeneity and obtain more reliable estimates of the programme's true effect Although distinct, DID and 2-stage regression can be used sequentially in certain situations Recognise that self-selection might still create endogeneity issues
  • 6. PROJECT’S PRIMARY GOALS To i d e n t i f y f a c t o r s i n f l u e n c i n g m e d i c a l ex p e n s e s g i ve n t h e va r i a b l e s w h i l e r e m o v i n g e n d o ge n e i t y i s s u e An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 7. CONTEXT Good health insurance is one that can cover a maximum amount of medical expenses so that people don't have to worry about paying medical bills As a health insurance company, the company saw its sales fall significantly over time, something that is causing concerns It is the firm's intention to analyse factors that determine medical expenses in order to improve their sales in the coming fiscal year By conducting the study, they will have a better understanding of their customers' needs and be able to develop their marketing strategies accordingly 4/21/2024 PRESENTATION TITLE 7 An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 8. Dataset OLS REGRESSION Observed outcomes from OLS regression using independent variables 1 STAGE-2 REGRESSION • Observe outcomes from Stage - 1 Regression with the endogenous variable as the target variable • Observe Stage - 2 Regression using predicted endogenous variable 2 INSIGHTS Form insights from results extracted out of OLS regression and Stage - 2 Regression 3 8 DATASET & MODELLING STRATEGIES An Application of Ordinary Least Square Regression & Stage-2 Regression
  • 9. 9 SOCIAL SECURITY INCOME (SSI) RATIO An Application of Ordinary Least Square Regression & Stage-2 Regression Social Security Income is provided to Senior Citizens SSI Ratio calculation is done by considering multiple parameters like Years of earning, AIME [Average indexed monthly earnings], individual assets, and a number of dependencies Considering the above parameters, the governing body will decide the ratio of SSI to be provided to the individuals This final value has been provided in the dataset as ssiratio which can be used for further analysis directly)
  • 10. 10 OLS REGRESSION WITH INDEPENDENT VARIABLES An Application of Ordinary Least Square Regression & Stage-2 Regression Of the four control (also known as independent) variables, only 'illnesses' and 'healthinsu' have p-values below 0.05. These are significant as their statistics suggest that their relationships with 'logmedexpense' are not an occurrence of chance nor a random occurrence. So, an additional illness will raise medical expenses by 0.44 units while those with health insurance would see their medical expenses increased by 0.07 units. Since these are independent variables, we assume that there is no multicollinearity between these two variables. This would mean that as a patient has an additional of illness and at the same time has a valid medical insurance, he/she would experience a total of 0.5156-unit increase to his/her medical expenses
  • 11. 11 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • In a Linear Regression Analysis, the residual is the difference between the observed value and the predicted value of the dependent variable • For this regression, the residual value of 0.4544 means that the predicted value of this observation is 0.4544 units less than the observed value. • In other words, the model under-predicted the value of the dependent variable for this observation by this amount Stage - 1 Regression with the endogenous variable as target variable
  • 12. 12 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • Residuals are used to access how well the model fits the data. When the residuals are randomly distributed around zero, it suggests that the model is a good fit for the data • However, the Histogram (referring left) for the residuals does not show that the values are distributed around zero. In fact, the model mostly over-predicted and under-predicted the value of the dependent variable for the 10,089 observations; there are patterns in the residuals which may suggest that the model is not a good fit for the data • Conversely, the average predicted values for all 10,089 observations is 0.38, which is not closed to the observed values of the dependent variable. This again suggests that the model is not a good fit for the data
  • 13. 13 STAGE - 1 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • When the Stage – 1 Regression model is not a good fit for the data, it means that the model is not accurately capturing the relationship between the independent and dependent variables • There are several possible reasons causing this, like omitted variables, incorrect functional form, or invalid instrument. In such cases, the estimates produced by the model may be not accurately reflect the true relationship between the variables • To improve the fit of the model, additional relevant variables should be included , changing the functional form of the model, or using a different instrument so that the first stage satisfy the condition of relevance and exogeneity • However, since there isn’t additional information provided in the project, making improvement to the model is infeasible
  • 14. 14 STAGE - 2 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression The statistics suggests that an additional unit of illness and an additional unit of income would, respectively, increase medical expenses by 0.449 unit and 0.098 unit. Conversely, an additional unit of age and people with health insurance would, respectively, lower medical expenses by 0.012 unit and 0.852 unit. All these four independent variables have P-values lesser than 0.05, which suggests that these are significant, and not occurrences of chance nor a random occurrence Stage - 2 Regression using predicted endogenous variable
  • 15. 15 INSIGHTS FROM OLS & STAGE - 2 REGRESSION An Application of Ordinary Least Square Regression & Stage-2 Regression • In the Stage – 1 analysis, the endogenous variable is regressed to the Instrumental Variable. At this stage, since the P-value for the Instrumental Variable is less than 0.05, it indicates that the Instrumental Variable is significantly related to the endogenous variable • This is known as the relevance condition for an instrumental variable, which means that the instrument is correlated with the endogenous variable and can be used to predict it • If the value of the F-Stat could be calculated, using tools like R or Python, the strength and weakness of the instrument could be further determined The Linear Regression results suggest that people with health insurance would experience a 0.075-unit increase in medical expenses. While the 2-Stage results suggest that people with health insurance would experience a 0.852-unit decrease in medical expenses. SSI Ratio is associated with -0.1998 units of health insurance. These two estimates seem to be heading in opposite directions, and endogeneity problems is suspected
  • 16. An Application of Ordinary Least Square Regression & Stage-2 Regression to Remove Endogeneity Issues in Casual Inference Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com FACTORS INFLUENCING MEDICAL EXPENSES