CORRELATION
REGRESSION
BIOSTATISTICS
SEMESTER 8
M PHARMACY
CORRELATION VS REGRESSION
REGRESSION ANALYSIS
LINEAR AND MULTIPLE REGREISSIO
CORRELATION COEFFICIENT
Biostats coorelation vs rREGRESSION.DIFFERENCE BETWEEN CORRELATION AND REGRESSION EQUATIONpptx
1. BIOSTATISTICS
CORRELATION AND REGRESSION
By Mr Payaam Vohra
NIPER AIR 11
Gold Medalist in MU
MET AIR 07
ICT MTECH SCORE RANK 01
CUET-PG AIR 01
IIT BHU AIR 08
GATE AND BITS HD QUALIFIED
GPAT AIR 43
2. IV B.PHARMACY (BIO STATISTICS)
X2 - × Y2 -
( X)2
( Y)2
n n
d × d
X
2 2
Y
Calculated Formula for Karl Pearson's Coefficient of Correlatio (r )
XY -
X Y
Correlation coefficient (r) = n
=
dX dY
PROPERTIES OF CORRELATION:
1. The limits of the Karl Pearson’s coefficient of correlation are 1. i.e.,
-1 r 1.
2. The coefficient of correlation is independent of change of origin and
scale.
3. Two independent random variables are uncorrelated but converse is not
true.
PROBLEMS ON CORRELATION
Problem-21: The following data relate to the pod length and the number of
seeds per pod are given below. Calculate the correlation coefficient for the
following data.
Pod’s
length(cms)
4.5 4 5.2 4.6 5.2 5.2 4.3 4 4.5 5.5
No. of
seeds /plant
5 5 6 6 6 7 4 4 5 6
3. d × d
X Y
2 2
2.62 ×8.4
IV B.PHARMACY (BIO STATISTICS)
SOLUTION:
Pod’s
length(X)
No. of seeds
/plant(Y)
X2 Y2
XY
4.5 5 20.25 25 20.5
4 5 16 25 20
5.2 6 27.04 36 31.2
4.6 6 21.16 36 27.6
5.2 6 27.04 36 31.2
5.2 7 27.04 49 36.4
4.3 4 18.49 16 17.2
4 4 16 16 16
4.5 5 20.25 25 22.5
5.5 6 30.25 36 33
X =47 Y = 54 X2 = 223.52 Y2 = 300 XY = 257.6
( X )2
(47)2
dX
2 2
= X - = 223.52 - = 2.62
n
( Y)
2
(54)
10
2
dY
2 2
= Y - = 300 - = 8.4
n 10
d d = XY -
X Y
= 257.6 - 47×54 = 3.8
X
Y
n 10
XY -
X Y
Correlation coefficient (r) = n
X2 - ( X) × Y2 - ( Y )
2 2
n n
=
dX dY
= 3.8 = 0.81
since, r > 0 then the given data is positive correlation
4. IV B.PHARMACY (BIO STATISTICS)
d × d
X
2 2
Y
160 358
Problem-22: Calculate the correlation coefficient between X and Y from the
following data.
X 5 9 13 17 21
Y 12 20 25 33 35
SOLUTION:
X Y X2 Y2
XY
5 12 25 144 60
9 20 81 400 180
13 25 169 625 325
17 33 289 1089 561
21 35 441 1225 735
X =65 Y =125 X2
=1005 Y2
= 3483 XY =1861
( X)2
(65)2
d
X
2
= X2 - =1005- =160
n 5
( Y)2
(125)2
d
Y
2
= Y2 - = 3483- = 358
n 5
d d = XY -
X Y
=1861-
65×125
= 236
X Y
n 5
XY -
X Y
Correlation coefficient (r) = n
X2 - ( X) × Y2 - ( Y)
2 2
n n
=
dX dY
=
236
= 0.9861
since, r > 0 then the given data is positive correlation
30
5. IV B.PHARMACY (BIO STATISTICS)
r12
2 + r13
2 2 r12 r13 r2
3
1 r 2
23
Problem-23: Calculate the correlation coefficient between height of fathers and
daughters both from the following Anuragian Family members.(Homework)
Heights
of father
64 65 66 67 68 69 70
Heights
of
daughter
s
66 67 68 69 70 71 72
MULTIPLE CORRELATION
The study of quantitative assessment of the magnitude and direction of
correlation between a given variable and the joint influence of two or more
variables is called multiple correlation.
1.23
R =
The squared value of multiple correlation Coefficients (R2
1.23 ) is called the
coefficient of determination.
Problem-24: the product moment r scores (r12) between gill weights(X1) and
body weight (X2) was found to be 0.80 in a sample of 33 fishes, the r scores (r13)
between their gill weights (X1) and body length (X3) amounted to 0.20, while
the r scores (r23) between their body weight (X2) and body length (X3) was
found to be 0.30. Find if there is significant multiple linear correlation of (X2)
and (X3) [ = 0.05]
SOLUTION:
6. I V B . P H A R MA C Y (BIO S TA TI S TI C S )
UNIT-II
CURVE FITTING
Suppose that a data is given in the two variables x and y. The
problem of finding an analytical expression of the form y = f(x)
which fits the given data is called curve fitting.
(Or)
Curve Fitting means an exact relationship between
two variables by algebraic equations. This
relationship is equation of the curve.
Curve Fitting means to form an equation of the curve from the given data.
Principle of Least square technique:
“The sum of the squares of the differences between
observed values and expected values should be minimum”
is called Residual Error.
i.e., E = y f (x)
2
is minimum.
The method of least squares aims at minimizing the value of the error E.
REGRESSION
Regression is used to denote estimation or prediction of the
average value of one variable for a specified value of the other
variable. One of the variables is called independent variable or the
explained variable and the other is called dependent variable or
explaining variable.
Definition of Regression: Regression is the Measure of the Average
relationship between two or more variables in terms of the original
units of the data.
7. IV B.PHARMACY (BIO STATISTICS)
LINES OF REGRESSION:
Lines of Regression are the line which gives the best estimate of the value
of one variable for any given value of the other variable. In case of two variables
X and Y. we shall have two lines of Regression.
1. Regression line Y on X
2. Regression line X on Y
(1) Regression Line Y on X:
The Regression equation or form of the line Y on X is Y = a + bX. Where, ‘Y’ is
dependent variable, ‘X’ is independent variable, the values of ‘a’ and ‘b’ are
unknown constants.
(2) Regression Line X on Y:
The Regression equation or form of the line Y on X is X = a + bY. Where,
‘X’ is dependent variable, ‘Y’ is independent variable, the values of ‘a’ and ‘b’
are unknown constants.
There are two types of Regression equations:
1.Regression equation of X on Y is
(x x) =bxy ( y y )
where: x = value of x
x = Mean of x
y = value of y
y = Mean of y
b = r x d d
xy bxy
y
= x y
y
x = s tan dard deviation of x series
y = s tan dard deviation of y series
d 2
8. b = r
IV B.PHARMACY (BIO STATISTICS)
2.Regression equation of Y on X is
(y y)=byx (x x)
where: x = value of x
x = Mean of x
y = value of y
y = Meanof y
b
y
d d
yx
x
yx
d 2
= x y
x
x = s tan dard deviation of x series
y = s tandard deviation of y series
PROPERTIES OF REGRESSION COEFFICIENT:
PROPERTIES OFREGRESSIONCOEFFICIENT:
1.The correlation coefficient is Geometric mean of two regression coefficients.
i.e. , r = bYX×bXY
2.The Arithmetic mean of the regression coefficient is grater than or equal to correlation coefficient
(or)The average of two regression coefficients will also be greater than the correlation coefficient.
i.e., bYX +bXY r
2
3. As the coefficient of correlation cannot exceed one, in case of regression one of the regression coefficient
is greater than one than the other must be less than one.
4. Both the regression coefficient will have the same sign either positive (or) negative.
5.If one regression coefficient is positive, then the other should also be positive and vice versa.
6.Regression coefficients areindependent change of origin, but not of a scale.
USES OF REGRESSION ANALYSIS:
1. The Regression analysis technique is very useful in predicting the probable
value of an unknown variable in response to some known related variable.
2. The Regression device is useful in establishing the nature of relationship
between the two variables.
3. Regression analysis is extensively used for measurement and estimating
the relationship among variables.
4. Regression analysis provides Regression coefficient which are generally
used in calculation of correlation coefficient.
9. IV B.PHARMACY (BIO STATISTICS)
CORRELATION ANALYSIS “V/S” REGRESSION ANALYSIS
CORRELATION ANALYSIS REGRESSION ANALYSIS
1. Correlation analysis attempts
to determine the degree of
relationship between the two
variables.
1. Regression analysis is
a
mathematical measure of
average
relationship between two or more
variables in terms of original
units of the data.
2. The Correlation analysis
tests the closeness of the
variable.
2. Regression analysis
measures
extent of change in dependent
variable due to change in
independent variable.
3. In Correlation analysis, the
casual relationship in variables
moving in the same direction (or)
opposite direction is studied.
3. In Regression analysis the
study
is made by taking into
consideration the cause-and-
effect relationship between two
variables.
4. In Correlation, there is a
chance
of nonsense correlation between
the two variables.
4.In Regression, there is no
chance
of existence of such
type of relation
between two variables.
5. Correlation
Coefficient is
independent of change of origin
and scale.
5. Regression
Coefficients are
independent of change of origin
but not a scale.
10. Y (1 r2 )
x2
( x)2
n
y2
( y)2
n
d 2
y
IV B.PHARMACY (BIO STATISTICS)
S tan dard Error of Estimate (or) Re gression :
(1)S tan dard Error of Estimate of X is given by
sX = X (1 r )
2
(2)S tan dard Error of Estimate of Y is given by
sY =
Where X = = d 2
x
=
=
Y
PROBLEMS ON REGRESSION
PROBLEM-1: Height and weight are recorded for 10 students. The results are
given below.
(iii)
(i) Calculate correlation coefficient and test the level of significance.
(ii) Obtain regression equation for X on Y & Y on X.and also Calculate
regression coefficient and test the level of significance.
Calculate Standard Error of Estimate (or) Regression.
HEIGHT 62 72 78 58 65 70 66 63 60 72
WEIGHT 50 65 63 50 54 60 61 55 54 65
SOLUTION:
Height(x) Weight(y) x2 y2
xy
62 50 3844 2500 3100
72 65 5184 4225 4680
78 63 6084 3969 4914
58 50 3364 2500 2900
65 54 4225 2916 3510
70 60 4900 3600 4200
66 61 4356 3721 4026
63 55 3969 3025 3465
60 54 3600 2916 3240
72 65 5184 4225 4680
x =666 y =577 x2 = 44710 y2 =33597 xy =38715
11. IV B.PHARMACY (BIO STATISTICS)
354.4 304.1
1 r2
Now we have to find
( x)2
(666)2
( ) 2 2
i d = x - = 44710- = 354.4
x
n 10
( y)
2
(577)2
d
y
2
= y2 - = 33597 - = 304.1
n 10
d d = xy -
x y
= 38715-
666×577
= 286.8
x y
n 10
xy -
x y
Correlation coefficient (r) = n
x2- ( x) × y2- ( y)
2 2
n n
d x
2 × d y
2
=
dx dy
= = 0.8736
286.8
since, r > 0 then the given data is positive correlation
SignificanceTest :
t Cal
=
r n 2
~t(n 2) d.f
=
0.8736 10 2
1
(0.8736)2
= 5.0774
Thetabulated value at 5% level of significance with (n 2) = 8 d.f is 2.31
Inference:if tCal > tTab then we Re ject Null hypothesis ( Accept Alternative hypothesis)
i.e., 5.0744 > 2.31
There is Significant between the Height &Weight
12. IV B.PHARMACY (BIO STATISTICS)
(ii) Now we have to find
where:
x = Mean of x =
666
= 66.6
10
y = Mean of y =
577
= 57.7
10
dx dy =
286.8
= 0.9431
b =
xy
d y2 304.1
yx
b =
dx2
dx dy
=
286.8
= 0.8093
354.4
1. Regression equation of X on Y is
(x x)=bxy (y y
) (x x)=bxy (y
y)
(x 66.6)= 0.9431(y 57.7)
= 0.9431 y 54.4175
= 0.9431 y 54.4175 + 66.6
x =12.1875 + 0.9431 y
2. Regression equation of Y on X is
(y y)=byx (x x
)
(y y)=byx (x
x)
(y 57.7)= 0.8093(x 66.6)
= 0.8093 x 53.8994
= 0.8093 x 53.8994 + 57.7
y = 3.8006 + 0.8093 x
13. IV B.PHARMACY (BIO STATISTICS)
2 d d )
2
1
n 2
dy
( x y
dx
2
x
d 2
x2
( x)2
n x
y2
n
286.8
SignificanceTest :
Now,we have to find
S.E of byx =
1 304.1
(286.8)2
10 2 354.4
tCal
=
354.4
= 0.1596
=
0.8093
= 5.0708
0.1596
The tabulated value at 5% l.o.s with (n 2)d. f = 8 d.f at 2.31
Inference : If tCal > tTab then Re ject Null Hypothesi( Accept Altrenative Hypothesis)
i.e., 5.0708> 2.31
The regression Coeffcient is Significant
(iii)Where X = = d 2
= 354.4 =18.82
( y)2
=
40
2= =16.93
=
Y
d
y
S tan dard Error of Estimate (or) Re gression:
(1) S tan dard Error of Estimate of X is given by
sX = X (1 r )=
2
18.82(1 (0.8736)2
)= 2.1112
(2) S tan dard Error of Estimate of Y is given by
sY = Y (1 r2
) = 16.93(1 (0.8736)2
) = 2.0024