Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Haal meer IBM SPSS Statistics 11.11.14 Voorspellen aan de hand van logistische regressie STATSCONsult
1. IBM SPSS presentation Amsterdam, 11th November 2014
Drs. Ing. J.A.C.M. Smit (Jan)
Director of
STATSCONsult, based in Drunen NL
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
2. STATSCONsult
Support, marketing and Sales of software products for statistical analyses
Courses in Statistics
Consultancy in Data Analyses
Jan Smit worked for SPSS from 1984 until 1989.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
3. STATSCONsult Consultancy
SPSS Intro courses
SPSS assistance in data analyses
SPSS advanced courses
SPSS Risk Analyses (including Weight of Evidence)
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
4. Examples of Logistic Regression
We wish to model the likelihood of an event that is likely to happen which depends on a number of factors (predictors):
◦To predict whether a patient has (or will have) a given disease
◦Prediction of a customer's propensity to purchase an appliance (TV)
◦Prediction of passing an exam
◦Prediction of paying back a loan in full
◦Risk analyses is done with Logistic Regression
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
5. What are the assumptions of using Logistic Regression?
The predictors are not too much highly multiple correlated (multicollinearity)
A continuous predictor should have a monotone descending (ascending) probability of the dependent variable in the data
We obtain a (model + error); residuals (=error) should not dominate
Model should be interpretable, easy to use and be useful for forecasting
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
6. Logistic regression, application
We analyse the effect of a number of independent predictors (x1, x2, .. xn) on a dependent variable Y, where Y in [0,1]
Covariates are predictors, for which we wish to correct (such as age)
Predictors can be continuous, nominal or ordinal
◦Independent variables can be continuous, Age
◦Ordinal or Nominal, Level of Education,
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
7. Data
1500 observations
We wish to have a model of Previous Default (Y=1)
From now on, we say Previous Default (risk to pay-off bankloan) =“Risk”
Interpret the model, use the model for prediction.
Based on the predictors (Age, .. ,Household Income)
548 observations have Risks (Y=1) in our data set
Here 90% of observations are used for the model; the remaining observations are used for prediction.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
8. What are my odds?
We cannot use regression, though we use all the theory of linear regression.
In Logistic regression our model is:
◦log(P(y=1)/P(y=0) )= a + b1*x1 + b2*x2 + ..+ bn*xn
◦Linear regression : Y= a + b1*x1 + b2*x2 + ..+ bn*xn (nearly the same)
◦Odds : P(y=1), P(y=0) and P(y=1)/P(y=0) ; my odds are 2 to 1, meaning P(y=1)/P(y=0)=2
Log(odds) makes statistics possible :
◦P=2/3: odds ratio= 2; log(2)=0,69
◦P=1/2: odds ratio=1; log(1)=0
A coefficient is the change in the log odds, when other factors are fixed.
Sometimes I have the Odds against, or Odds on, or Odds even.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
9. Bankloan data
We wish to model the chance of paying back in full a bank loan. When Risk =1, the loan was not in the end returned to the bank.
Y : Risk{1=yes, 0=no}
X : A number of factors that may affect Y
Age in years age
Level of education ed
Years with current employer employ
Years at current address address
Household income in thousands income
Debt to income ratio (x100) debtinc
Credit card debt in thousands creddebt
Other debt in thousands othdebt
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
10. Make groups via visual binning
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
11. Odd ratios of risk decreases with higher values for age
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
12. Dependency of Age on Risk
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
13. AND IN FORMULAS
We read : Log Odds = log(Risk=1 / Risk=0) = Constant + B * age =
1,250 – 0,055*age
For age=20 : LogOdds = 1,25-1,1= 0,15
For age=30 : LogOdds = 1,25- 1,65 = -0,4
For age=40 : LogOdds = 1,25 – 2,2 = -0,95
If age= 22,7 the LogOdds=0
According to model :
For age=20 : OddsRatio = exp(0,15) =1,15
For age=30 : OddsRatio = exp(-0,4) = 0,67
For age=40 : OddsRatio = exp(-0,95)= 0,4
Probability :
For age=20 : P(yes)= 0,54
For age=30 : P(Y=1) = 0,40
For age=40 : P(Y=10 = 0,28
We conclude Age can be used as an predictor for Risk (as Sig-P < 0,05)
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
14. Usage of dialog in IBM SPSS:
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
16. Output (2) from initial stage (all main effects) The -2 log likelihood is leading.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
17. Output of predictors and effect on Risk
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
18. What predictors can we use in the model to estimate “Risk”
If the Sig.-p< 0,05 for a predictor, we may conclude that this predictor has an effect on the depended variable (a significance effect).
If the Sig.-p>0,05 for a predictor, we may conclude that we are uncertain that this predictor has an effect on the depended variable.
Watch out for pit falls (remove a variable that has no effect, and re-estimate).
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
19. Modelling
By using Backwards (LR), at each step we re-estimate the model, leaving out a non- significant predictor:
After this step : only the variables: age, employ, debtinc, creddebt are significant
Note that correlations of predictors may affect the order of inclusion in model (employ and address are highly correlated)
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
21. Interpretation of the model
If the coefficient of a predictor < 0 the odd ratio decreases for larger values.
Large coefficients (positive or negative) are more important (go with large Wald statistics and small Sig.-p values).
Here people with
1.short period at current employer (Change) and
2.high Credit Card Debts (Expenders) and
3.high values of Debts to Income ratio (Have Fun) and
4.low ages (Young)
show high risk.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
22. Classification on the data in model; If we adjust the cut value to a lower p (0,5) the Predicted Yes column values becomes lower.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
23. Model expression
The model is:
Log(Risk=1/ Risk=0) = -0,133 (constant)
- 0,213 * employ (from 0 to 50)
+ 0,483 * creddebt (from 0 to 36)
+ 0,102 * debtinc (from 0 to 40)
- 0,040 * age (from 18 to 60)
If this expression > 0, the probability > 0,5
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
24. Prediction
Prediction is rather good (102 out of 133)
Make use of the model and apply this to the remaining observations that are not included in the model.
65+ 15 were formally classified as “No Risk”
65 +16 are selected in model as “No Risk”
We are able to change the cut off value of 0,5
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
25. Comparison Classification Trees and Logistic Regression
If the number of variables is high the result of LR still is simple; CT output will become large and complex.
CT finds interactions, segments, with highest P. With LR segments are determined with high probability.
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation
26. Vragen
Jan Smit
jan.smit@statsconsult.nl
+31 416 378 125
http://www.statsconsult.nl/
11/11/2014
STATSCONsult, Logistic Regression, IBM SPSS presentation