spss.pptx

RECODING VARIABLE &
ANOVA IN SPSS

The Data Editor of SPSS consists of two windows:
- Data View spreadsheet allows the data to be entered and viewed.
Each variable occupies a column of this spreadsheet.

- Variable View spreadsheet allows to define the types of variables
to be specified and viewed. Each variable occupies a row of this
spreadsheet.
1- Name: represents the variable name, that must be define
as alphanumeric characters (special characters are not
allowed).
2- Type: represents the definition of variable including
various formats for numerical data, dates, or currencies.
3- Width: represents the width of the actual data entries (
number of digits).
4- Decimals: represents the number of digits to the right of
the decimal place to be displayed for data entries.
5- Label: is attached to the variable name to remind user
about the meaning of variable.
6- Values: is used to define the categorical variables such as
sex variable includes ( male and female).

- Variable View spreadsheet allows to define the types of variables
to be specified and viewed. Each variable occupies a row of this
spreadsheet.
7- Missing: represents the values that are outside your
actual data range.
8- Columns: represents the width of the variable column in
the data view spreadsheet.
9- Align: represents how the variable entries to be aligned in
data view spreadsheet.
10- Measure: represents the measurement scale of the
variable.
Numeric variable  scale
String variable  Nominal
Categorical Variable  Ordinal

Recoding Variable in SPSS
Recoding variable is a process of creating a new variable from current available variables in the
dataset. This enable the user to work with some variables in different forms. For instance, the user
might want to categorize income or age variables into categories.
There are two types of recoding variable:
1- Into same variable (the original data will be replaced permanently by the new data)
2- Into different variable (create new variable to put the data into)
Let us start with the following example:

To recode the age variable into three categorizes:
(20-25)  Group A
(26-30)  Group B
(31-40)  Group C
From Transformation menu  Recode  Into Different Variables

To recode the age variable into three variables:
(20-25)  Group A
(26-30)  Group B
(31-40)  Group C

Analysis of Variance (ANOVA)
An ANOVA test is a way to find out if experiment results are significant. In other words,
it helps you to figure out if you need to reject the null hypothesis or accept
the alternate hypothesis. Basically, you’re testing groups to see if there’s a difference
between them.
Examples of when you might want to test different groups:
- A group of pathological patients are trying three different therapies. You want to see if one therapy is
better than the others.
- A manufacturer has two different processes to make light bulbs. They want to know if one process is
better than the other.
- Students from different colleges take the same exam. You want to see if one college outperforms the
other.

An ANOVA test is parametric tests that require assumptions for using it correctly.
The assumptions are:
1- Normality (the simplest identification trait for normality is that the mean, median and
trimmed mean are similar. For perfectly normal distribution, all three central tendency
measures must be equal. If the mean is considerably above or below the median and
trimmed mean, the data is skewed or asymmetrically distributed.
2- Homogeneity (this assumption implies that the variances of the three or more groups are similar (p
value >0.05).
3- Independence (ensure that the data values are not related. That is no two data items measure the
same thing or produce that same effect.

Analysis of Variance (ANOVA) Example
A farmer wants to know if the weight of parsley plants is influenced by using a fertilizer. He selects 90
plants and randomly divides them into three groups of 30 plants each. He applies a biological
fertilizer to the first group, a chemical fertilizer to the second group and no fertilizer at all to the third
group. After a month he weighs all plants, and he want to conclude the fertilizer affects weight?
1- Quick Normality Check of Data
Fertilizer mean median
None 51.2 50
Biological 53.63 53.5
Chemical 56.96 57.5

1- Quick Normality Check of Data
It is possible to check the distribution of data visually
by drawing the histogram.
Graphs Legacy Dialog Histogram

Analyze  Compare means  One-way ANOVA

P-value >0.05
P-value <0.05 that’s mean
there is a significant
differences among the
three fertilizer methods
and the weights of parsley
plants are affected by the
fertilizer. We need to
explore where the
differences are?
We need to carry out a
post-Hoc analyses.

From the multiple comparison
table we see that none and
chemical fertilizer exert
significantly different effects on
parsley plants, where
(p<0.008)

Calculate the mean, median and 20% trimmed mean for the number set x={8, 3, 7, 1, 3, 9}:
Sample size = 6 then:
1- mean= 𝑖 𝑥𝑖
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
=
(8+3+7+1+3+9)
6
= 5.17
2- To find median, it is important to arrange the sample ascending:
The new number set is {1,3,3,7,8,9} then the median=
(3+7)
2
= 5
3- To find trimmed mean,
Trimmed count = Trimmed Mean Percent x Sample Size = ~(0.2*6)=~(1.2)=1
Then remove first and last numbers from {1,3,3,7,8,9}  {3,3,7,8}
Now we can calculate the mean = 𝑖 𝑥𝑖
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
=
(3+3+7+8)
4
= 5.25

Descriptive Statistics
Descriptive statistics can be used to summarize the data. It can be implemented from
Analyze  Descriptive Statistics
If your data is categorical, try the frequencies or crosstabs procedures to determine the following
statistics, while the descriptive is used when we have scale or nominal data:

The Frequencies
1- Quartile:
- Q1 is defined as the middle number between the smallest number and
the median of the data set (splits off the lowest 25% of data from
the highest 75%)
- Q2 is the median of the data (cuts data set in half).
- Q3 is the middle value between the median and the highest value of
the data set (splits off the highest 25% of data from the lowest 75%).
The main advantages of Quartile:
As you know, the median is a measure of the central tendency of the data but says nothing about
how the data is distributed in the two arms on either side of the median. Quartiles help us measure
this.

The Frequencies
2- Mean
3- Median
4- Mode: refers to the most frequently occurring number found in a set of numbers.
5- range: is the difference between the largest and smallest values, and it gives a good sense of data
distribution (dispersion).
6- Standard deviation is used to quantify the amount of variation or dispersion of a set of data values.
A low standard deviation indicates that the data points tend to be close to the mean of the set, while a
high standard deviation indicates that the data points are spread out over a wider range of values.
7- Variance measures how far a set of (random) numbers are spread out from their average value.

The Frequencies
8- Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a
skewness equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially
measures the relative size of the two tails.
9- Kurtosis measures the heaviness or lightness of the tails of your data, this indicates that your data
looks flatter (or less flat) compared to the normal distribution. So the kurtosis is 0 for a normal
distribution.

spss.pptx

Recommended

Recommended

More Related Content

Similar to spss.pptx

Similar to spss.pptx (20)

More from saraso888

More from saraso888 (20)

Recently uploaded

Recently uploaded (20)

spss.pptx