SlideShare a Scribd company logo
1 of 40
Download to read offline
Data Analysis Course
Testing of Hypothesis
Venkat Reddy
Data Analysis Course
•   Data analysis design document
•   Introduction to statistical data analysis
•   Descriptive statistics
•   Data exploration, validation & sanitization
•   Probability distributions examples and applications




                                                                Venkat Reddy
                                                          Data Analysis Course
•   Simple correlation and regression analysis
•   Multiple liner regression analysis
•   Logistic regression analysis

• Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1                                 2
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy                      3
Contents
• What is the need of testing
• Recap of sampling distribution
• Hypothesis testing
   • Five main steps in testing
       •   Assumptions
       •   Hypotheses




                                         Venkat Reddy
                                   Data Analysis Course
       •   Test Statistic
       •   P-value (P)
       •   Conclusion:
• Testing example
• Types of errors
• Testing for Means
   • Z test
   • T test
• Testing for Proportions
• Test of independence                    4
Inference
• A cake, (weighs 20 Kg) how do you decide about its taste? A piece of
  cake or after eating completely
• A truck half filled with oranges rest with apples. How would you
  verify that? By counting them?
• Apple & Samsung sell mobiles all over the world. If you want to find




                                                                               Venkat Reddy
                                                                         Data Analysis Course
  which one people prefer, Do you take the opinion poll from all the
  users around the world?
• Product manager claims that girls are buying their product more
  than boys? Is there any association between gender and buying or
  not buying?



                                                                                5
Inference
• A cake, (weighs 20 Kg) how do you decide about its taste? A piece of cake or
  after eating completely
   • You took a piece of cake, you are not completely satisfied with the taste –Would
     you recommend the cake?
   • You took a 10 gram piece of cake, you didn’t like the taste—What would you say?
   • You took a 1 milligram piece of cake, you liked the taste—What would you say?




                                                                                              Venkat Reddy
                                                                                        Data Analysis Course
• A truck half filled with oranges rest with apples (owner claims 10,000
  oranges & 10,000 apples) . How would you verify that? By counting them?
   • You randomly picked 200 fruits, 120 are apples and 80 are oranges. What would
     you say?
   • You randomly picked 200 fruits, 190 are apples and 10s are oranges. What would
     you say?


   How bad in the sample is bad enough to say the population is also
   bad                                                                                         6
Statistical Inference
• Inferences about a population are made on the basis of results
  obtained from a sample drawn from that population
• Want to talk about the larger population from which the
  subjects are drawn, not the particular subjects!




                                                                                   Venkat Reddy
                                                                             Data Analysis Course
                            • A hypothesis test is a process that uses
                              sample statistics to test a claim about the
                              value of a population parameter.
                            • A verbal statement, or claim, about a
                              population parameter is called a statistical
                              hypothesis.
                            • Hypothesis: Proportion of apples = 0.5
                                                                                    7
Applications of testing
• Law and Forensics
  • Testing for discrimination (in admission, hiring, pay, promotion practices, etc.) –
    test of proportions
  • Paternity testing
  • Testing whether evidence found on a suspect came from the crime scene (blood,
    fiber, fingerprints, ...)




                                                                                                Venkat Reddy
                                                                                          Data Analysis Course
  • Indeed, testing whether the defendant is guilty or not
• Medicine and Health
  • Testing if a new drug is effective or ineffective-test of association
  • Testing if particulate matter in air pollution causes lung cancer
  • Testing if a particular gene is responsible for hemophilia
• Industry/Business/Economics
  • Testing if a production machine is ‘in control’ or not-test of means
  • Testing if a silicon wafer is good for use
• Science and Engineering
  • Testing which theory of gravitation is correct, based on dark matter                         8
  • Testing whether men and woman differ according to a psychological trait
  • Testing if two species have a common ancestor
Hypothesis Testing – An example
CEO of SBI claims employees mean age in SBI bank is 35
(292,215 employees). How can we prove or disprove it?
1. Take a random sample (500) and find their age
2. If it is near 35 then we say there is no evidence to




                                                                   Venkat Reddy
                                                             Data Analysis Course
   reject that hypothesis
3. What if sample average age is lower or higher than 35 ?
4. How far is really far? We want to quantify the severity
   of deviation……
5. Lets find the probability of this occurrence
6. It if it is really low, then we say we reject null
   hypothesis                                                       9
Hypothesis testing process
 Assume the
 population
 mean age is 35.
 (Null Hypothesis)




                                                     Venkat Reddy
                                               Data Analysis Course
                                  Population
                                  100,000
                     The Sample
Is X  40   35?   Mean Is 40
 No, not likely!

    REJECT

Null Hypothesis                     Sample         10
                                    500
Reason for Rejecting H0
              Sampling Distribution


                                              ... Therefore, we reject
                                                 the null hypothesis




                                                                               Venkat Reddy
                                                                         Data Analysis Course
                                                     that  = 35.




            ... if in fact this were
             the population mean.



                     = 35                   40
                                  It is unlikely that we would get a         11
                       H0         sample mean of this value ...
Sampling distribution -Recap
   • A sampling distribution is the probability distribution of a sample
     statistic that is formed when samples of size n are repeatedly taken
     from a population.
   • If the sample statistic is the sample mean, then the distribution is
     the sampling distribution of sample means.




                                                                                  Venkat Reddy
                                                                            Data Analysis Course
                                                  Sample

                  Sample

                             Sample                             Sample
      Sample                                     Sample



The sampling distribution consists of the values of the sample
                                                                                12
means,
Central Limit theorem -Recap
 If a sample n (30) is taken from a population with
 any type distribution that has a mean =
 and standard deviation =
the sample means will have a normal distribution
                                 and standard deviation




                                                                    Venkat Reddy
                                                              Data Analysis Course
                                                                  13

                                                          x
Five Step in Testing of Hypothesis
1.   Make Assumptions and meet test requirements.

2.   State the null hypothesis.




                                                                         Venkat Reddy
                                                                   Data Analysis Course
3.   Select the sampling distribution and establish the critical
     region.

4.   Compute the test statistic.

5.   Make a decision and interpret results.
                                                                       14
Step 1: Make Assumptions and Meet Test
Requirements
• Random sampling
  • Hypothesis testing assumes samples were selected using random
    sampling.
  • In this case, the sample of 500 cases was randomly selected from
    all major branches.




                                                                             Venkat Reddy
                                                                       Data Analysis Course
• Level of Measurement is Interval-Ratio.
  • Yes age is not a categorical variable
• Sampling Distribution is normal in shape.
  • What is the sampling distribution of age?
• This is a “large” sample (n≥100).

                                                                           15
Step 2 State the Null Hypothesis
• H0: μ = 35
• In other words, Ho: No difference between the sample mean
  and the population parameter
• In other words, The sample mean of 40 is really the same as




                                                                        Venkat Reddy
                                                                  Data Analysis Course
  the population mean of 35 – the difference is not real but is
  due to chance.
• In other words, The sample of 500 comes from a population
  that has average age of 35
• In other words, The difference between 35 and 40 is trivial
  and caused by random chance.

                                                                      16
Step 2 (cont.) State the Alternate Hypothesis

• H1: μ≠35
• Or H1: There is a difference between the sample mean and
  the population parameter
• Or The sample of 500 comes a population that does not have




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  average age 35 In reality, it comes from a different population.
• Or The difference between 40 and 35 reflects an actual
  difference
• Or the average age of the population is more than 35




                                                                         17
Step 3 Select Sampling Distribution and
Establish the Critical Region
• What is the sampling distribution of population mean?
• What is alpha?
   • Probability of rejecting H0 when it is true
• α is the indicator of “rare” events.
• Any difference with a probability less than α is rare and will cause us




                                                                                  Venkat Reddy
                                                                            Data Analysis Course
  to reject the H0.
• We started with H0 as true, we still want to reject the null if the
  statistic is beyond a certain value,
• We already know about some unlikely values of test statistic when
  null hypothesis is true
• for example if the average age of the sample is 60, we definitely
  want to reject null

Details later                                                                   18
Step 4: Use Formula to Compute the Test
Statistic - Z for large samples (≥ 100)
 • We got the sample average as 40, the age according to null hypothesis
   is 35, there is a difference of 5, is it due to chance?
 • How bad is this difference of 5?


                    
                 Z




                                                                                 Venkat Reddy
                                                                           Data Analysis Course
                     N
 When the Population σ is not known, use the following formula:


                                        40  35
       Z                            Z
          s n 1                        7.86 500  1
                                                                               19
Step 5 Make a Decision and Interpret
 Results
  • The obtained Z score fell in the Critical Region, so we reject the H0.
     • If the H0 were true, a sample outcome of 14 would be unlikely.
     • Therefore, the H0 is false and must be rejected.


                                                                 a




                                                                                         Venkat Reddy
                                                                                   Data Analysis Course
     H0:   35
     H1:  > 35                                                         P-Value

                                                   0
What does z of 14 mean? The probability of z being more than 14 is less than
0.000000001
• It is like getting more than 25 heads in a row when you toss a coin
• It is like drawing the same card more than 6 times from a shuffled deck
• If the average age of 40 is just by chance then compare that chance with above       20
   examples
What is P-Value
• If the observed statistic happens to be just a chance, p-values tells
  us what is the probability of that chance
• The P-value answer the question: What is the probability of the
  observed test statistic or one more extreme when H0 is true?
• Given H0, probability of the current value or extreme than this




                                                                                 Venkat Reddy
                                                                           Data Analysis Course
• Given H0 is true, probability of obtaining a result as extreme or more
  extreme than the actual sample
• The observed significance level, or p-value of a test of hypothesis is
  the probability of obtaining the observed value of the sample
  statistic, or one which is even more supportive of the alternative
  hypothesis, under the assumption that the null hypothesis is true.
• Smallest α the observed sample would reject H0
                                                                               21
P -value
• If the alternative hypothesis contains the greater-than symbol (>), the
  hypothesis test is a right-tailed test.


   H0: μ =k
   Ha: μ > k




                                                                                          Venkat Reddy
                                                                                    Data Analysis Course
                                                             P is the area to
                                                             the right of the
                                                             test statistic.



                                                                                z
                     -3    -2     -1    0     1          2       3
                                               Test                                     22
                                             statistic
One tail & two tailed tests
• Two-tailed Test
  • If the alternative hypothesis contains the not-equal-to symbol (),
    the hypothesis test is a two-tailed test. In a two-tailed test, each
    tail has an area of 0.5P.
        • H0: μ = k
        • Ha: μ  k




                                                                                 Venkat Reddy
                                                                           Data Analysis Course
• Left-tailed Test
  • If the alternative hypothesis contains the less-than inequality
    symbol (<), the hypothesis test is a left-tailed test.
        • H0: μ  k
        • Ha: μ < k
• Right-tailed Test
  • If the alternative hypothesis contains the less-than inequality
    symbol (<), the hypothesis test is a left-tailed test.
        • H0: μ  k                                                            23
        • Ha: μ < k
Types of errors
• No matter which hypothesis represents the claim, always
  begin the hypothesis test assuming that the null hypothesis is
  true.
• At the end of the test, one of two decisions will be made:




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  • reject the null hypothesis, or
  • fail to reject the null hypothesis.
• A type I error occurs if the null hypothesis is rejected when it
  is true.
• A type II error occurs if the null hypothesis is not rejected
  when it is false.

                                                                         24
Error Types
• Type I Error: Reject H0 when it is true
• Type II Error: Do not reject H0 when it is false


       Test Result –                      Don’t Reject




                                                               Venkat Reddy
                                                         Data Analysis Course
                          Reject H0
                                              H0
     Reality
     H0 True           Type I Error     Correct


     H0 False          Correct          Type II Error

                                                             25
Level of Significance a
• Defines Unlikely Values of Sample Statistic if Null Hypothesis Is
  True
  • Called Rejection Region of Sampling Distribution
• Designated a (alpha)




                                                                            Venkat Reddy
                                                                      Data Analysis Course
• Typical values are 0.01, 0.05, 0.10
• Selected by the Researcher at the Start Provides the Critical
  Value(s) of the Test
• P(Type I error)
• Think of analogy with SBI average age


                                                                          26
Level of Significance, a and the Rejection
  Region

H0:   35                           a         Critical
                                               Value(s)
H1:  < 35
                                 0




                                                                Venkat Reddy
                                                          Data Analysis Course
             Rejection Regions
                                           a
H0:   35
H1:  > 35                       0
                                           a/2
H0:   35
H1:   35
                                 0                            27
Power of test 1-b
• P(Type II error) is b
• P(Type II error) = b depends on the true value of the
  parameter (from the range of values in Ha ).
• The farther the true parameter value falls from the null value,




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  the easier it is to reject null, and P(Type II error) goes down.
• Power of test = 1 - b = P(reject null, given it is false)
• In practice, you want a large enough n for your study so that
  P(Type II error) is small for the size of effect you expect.



                                                                         28
Which error is bad?
• False negative
  • Miss what could be important
      • Testing a metal whether it is gold or not
      • Are these samples going to be looked at again?
• False positive




                                                               Venkat Reddy
                                                         Data Analysis Course
  • Waste resources following dead ends
      • Test whether a drug is deadly or not




                                                             29
Confidence Intervals
• Hypothesis testing focuses on where the sample mean is
  located
• Confidence intervals focus on plausible values for the
  population mean
• General Formula (1-α)% CI for μ




                                                                 Venkat Reddy
                                                           Data Analysis Course
                Z1a / 2      Z1a / 2 
            X            ,X            
                    n              n 


• Construct an interval around the point estimate
• Look to see if the population/null mean is inside            30
Significance Test for Mean
                                 
  For large samples         Z
                                  N

                                y  0
  For small samples t                   where se  s / n
                                  se




                                                                                             Venkat Reddy
                                                                                       Data Analysis Course
• Sampling distribution for small samples is t
• The curve of the t distribution varies with sample size (the smaller the size, the
  flatter the curve)
• In using the t-table, we use “degrees of freedom” based on the sample size.
• For a one-sample test, df = n – 1.
• When looking at the table, find the t-value for the appropriate df = n-1. This
  will be the cutoff point for your critical region.                                       31
Lab: Significance Test for Mean
• It is known that the mean cholesterol level for the nation is 190. We
  test 100 only children and find that the sample average cholesterol level is
  198 and suppose we know the population standard deviation  = 15. does
  that signify that only children have an average higher cholesterol level than
  the national average?
• Given this sample what are the reasonable values for population mean?




                                                                                        Venkat Reddy
                                                                                  Data Analysis Course
• 50 smokers were questioned about the number of hours they sleep each
  day. We want to test the hypothesis that the smokers need less sleep than
  the general public which needs an average of 7.7 hours of sleep. If the
  sample mean is 7.5 and the population standard deviation is 0.5, what can
  you conclude?
• Given this sample what are the 95% confidence limits for population mean?

                                                                                      32
Test of Proportion
• Assumptions:
  • Categorical variable
  • Randomization
  • Large sample (but two-sided ok for nearly all n)




                                                             Venkat Reddy
                                                       Data Analysis Course
• Hypotheses:
   • Null hypothesis: H0: p  p0
   • Alternative hypothesis: Ha: p  p0 (2-sided)
   • Ha: p > p0 Ha: p < p0 (1-sided)
   • Set up hypotheses before getting the data
• Test statistic:
                     p p0
                     ˆ           p p0
                                  ˆ
                  z                                      33
                            p 0 (1  p 0 ) / n
                          pˆ
Lab: Test of Proportion
• Suppose a coin toss turns up 12 heads out of 20 trials. At .05
  significance level, can one reject the null hypothesis that the
  coin toss is fair?
• Suppose that you interview 1000 exiting voters about who
  they voted for PM. Of the 1000 voters, 550 reported that they




                                                                          Venkat Reddy
                                                                    Data Analysis Course
  voted for Rahul Gandhi. Is there sufficient evidence to suggest
  that Rahul Gandhi will win the election at the .01 level?




                                                                        34
Chi square test for Independence
• Chi square test of independence
• Is happiness independent of family income? A sample of 2955
  families are studied




                                                                                           Venkat Reddy
                                                                                     Data Analysis Course
               Happiness
                           Very         Pretty         Not too        Total
        Income Above              272            294             49           615
               Average            454            835         131              1420
               Below              185            527         208              920




                                                                                         35
Chi-square statistic
• Summarize closeness of {fo} and {fe} by

                      ( fo  fe )2
               2  
                           fe




                                                                      Venkat Reddy
                                                                Data Analysis Course
 where sum is taken over all cells in the table.

• When H0 is true, sampling distribution of this statistic is
  approximately (for large n) the chi-squared probability
  distribution.


                                                                    36
Chi-square calculation
• In happiness and family income

                 ( f o  f e )2 (272  189.6)2
             
              2
                                               ...  172.3
                       fe           189.6




                                                                                     Venkat Reddy
                                                                               Data Analysis Course
• df = (3 – 1)(3 – 1) = 4. P-value = 0.000 (rounded, often reported as P <
  0.001). Chi-squared percentile values for various right-tail probabilities
  are in table on text p. 594.
• There is very strong evidence against H0: independence (If H0 were
  true, prob. would be < 0.001 of getting this large a 2 test statistic or
  even larger).
• For significance level a = 0.05 (or a = 0.01 or a = 0.001), we reject H0
  and conclude that an association exists between happiness and
  income.                                                                          37
Lab Chi-square distribution
• Suppose that 125 children are shown three television commercials for
  breakfast cereal and are asked to pick which they liked best. The results are
  shown in table below. You would like to know if the choice of favorite
  commercial was related to whether the child was a boy or a girl or if these
  two variables are independent




                                                                                        Venkat Reddy
                                                                                  Data Analysis Course
             A           B         C         Totals
    Boys     30          29        16        75
    Girls    12          33        5         50
    Totals   42          62        21        125


• Suppose you conducted a drug trial on a group of animals and you
  hypothesized that the animals receiving the drug would show increased
  heart rates compared to those that did not receive the drug. You conduct
  the study and collect the following data.
                  Heart Rate Increased No Heart Rate Increase   Total
                                                                                      38
  Treated     36                        14                      50
  Not treated 30                        25                      55
  Total       66                        39                      105
Further Reading
• Test of samples means for two populations
• Test of sample proportions for two populations
• Odds ration for test of association




                                                         Venkat Reddy
                                                   Data Analysis Course
                                                       39
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com
21.venkat@gmail.com




                                       Venkat Reddy
                                 Data Analysis Course
+91 9886 768879




                                     40

More Related Content

What's hot

General research methodology mpharm
General research methodology  mpharmGeneral research methodology  mpharm
General research methodology mpharmAlkaDiwakar
 
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...AZCPh
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantPRAJAKTASAWANT33
 
Parametric test _ t test and ANOVA _ Biostatistics and Research Methodology....
Parametric test _ t test and ANOVA _  Biostatistics and Research Methodology....Parametric test _ t test and ANOVA _  Biostatistics and Research Methodology....
Parametric test _ t test and ANOVA _ Biostatistics and Research Methodology....AZCPh
 
Unit 2 Regression- BSRM.pdf
Unit 2 Regression- BSRM.pdfUnit 2 Regression- BSRM.pdf
Unit 2 Regression- BSRM.pdfRavinandan A P
 
Non-parametric Statistical tests for Hypotheses testing
Non-parametric Statistical tests for Hypotheses testingNon-parametric Statistical tests for Hypotheses testing
Non-parametric Statistical tests for Hypotheses testingSundar B N
 
Parametric tests
Parametric testsParametric tests
Parametric testsheena45
 
PPT on Sample Size, Importance of Sample Size,
PPT on Sample Size, Importance of Sample Size,PPT on Sample Size, Importance of Sample Size,
PPT on Sample Size, Importance of Sample Size,Naveen K L
 
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...Himanshu Sharma
 
Nonparametric tests
Nonparametric testsNonparametric tests
Nonparametric testsArun Kumar
 
Statistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestStatistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestVasundhraKakkar
 

What's hot (20)

Non parametric test
Non parametric testNon parametric test
Non parametric test
 
General research methodology mpharm
General research methodology  mpharmGeneral research methodology  mpharm
General research methodology mpharm
 
Standard error of the mean
Standard error of the meanStandard error of the mean
Standard error of the mean
 
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...
Probability ,Binomial distribution, Normal distribution, Poisson’s distributi...
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta Sawant
 
Parametric test _ t test and ANOVA _ Biostatistics and Research Methodology....
Parametric test _ t test and ANOVA _  Biostatistics and Research Methodology....Parametric test _ t test and ANOVA _  Biostatistics and Research Methodology....
Parametric test _ t test and ANOVA _ Biostatistics and Research Methodology....
 
Unit 2 Regression- BSRM.pdf
Unit 2 Regression- BSRM.pdfUnit 2 Regression- BSRM.pdf
Unit 2 Regression- BSRM.pdf
 
Non-parametric Statistical tests for Hypotheses testing
Non-parametric Statistical tests for Hypotheses testingNon-parametric Statistical tests for Hypotheses testing
Non-parametric Statistical tests for Hypotheses testing
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
 
Experimental design techniques
Experimental design techniquesExperimental design techniques
Experimental design techniques
 
Regression
RegressionRegression
Regression
 
Kruskal Wall Test
Kruskal Wall TestKruskal Wall Test
Kruskal Wall Test
 
Parametric tests
Parametric testsParametric tests
Parametric tests
 
PPT on Sample Size, Importance of Sample Size,
PPT on Sample Size, Importance of Sample Size,PPT on Sample Size, Importance of Sample Size,
PPT on Sample Size, Importance of Sample Size,
 
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...
Introduction to Research - Biostatistics and Research methodology 8th Sem Uni...
 
Nonparametric tests
Nonparametric testsNonparametric tests
Nonparametric tests
 
Non-Parametric Tests
Non-Parametric TestsNon-Parametric Tests
Non-Parametric Tests
 
Statistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestStatistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-Test
 

Viewers also liked (20)

Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesis
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
ARIMA
ARIMA ARIMA
ARIMA
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Decision tree
Decision treeDecision tree
Decision tree
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 

Similar to Testing of hypothesis

How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share ilmideas
 
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...ilmideas
 
Qualitative Methods Workshop Day 1
Qualitative Methods Workshop Day 1Qualitative Methods Workshop Day 1
Qualitative Methods Workshop Day 1Jason Rutter
 
Some nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal dataSome nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal dataRegent University
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testingkpgandhi
 
Updated final exam review psyc 2950.002(1)
Updated final exam review psyc 2950.002(1)Updated final exam review psyc 2950.002(1)
Updated final exam review psyc 2950.002(1)jbnx
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsGillian Byrne
 
Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
Lecture 2 What is Statistics, Anyway
Lecture 2 What is Statistics, AnywayLecture 2 What is Statistics, Anyway
Lecture 2 What is Statistics, AnywayJason Edington
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalIstiqlalEid
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña Sage
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña
 
probability sampling
probability samplingprobability sampling
probability samplingRoshni Kapoor
 
Chapter 5 class version b(1)
Chapter 5 class version b(1)Chapter 5 class version b(1)
Chapter 5 class version b(1)jbnx
 

Similar to Testing of hypothesis (20)

How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share
 
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...
How to Develop and Implement Effective Research Tools from Ilm Ideas on Slide...
 
Qualitative Methods Workshop Day 1
Qualitative Methods Workshop Day 1Qualitative Methods Workshop Day 1
Qualitative Methods Workshop Day 1
 
Some nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal dataSome nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal data
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testing
 
Updated final exam review psyc 2950.002(1)
Updated final exam review psyc 2950.002(1)Updated final exam review psyc 2950.002(1)
Updated final exam review psyc 2950.002(1)
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statistics
 
Data Mining Lecture_2.pptx
Data Mining Lecture_2.pptxData Mining Lecture_2.pptx
Data Mining Lecture_2.pptx
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Lecture 2 What is Statistics, Anyway
Lecture 2 What is Statistics, AnywayLecture 2 What is Statistics, Anyway
Lecture 2 What is Statistics, Anyway
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...
 
What is research
What is researchWhat is research
What is research
 
probability sampling
probability samplingprobability sampling
probability sampling
 
Chapter 5 class version b(1)
Chapter 5 class version b(1)Chapter 5 class version b(1)
Chapter 5 class version b(1)
 

More from Venkata Reddy Konasani

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationVenkata Reddy Konasani
 

More from Venkata Reddy Konasani (17)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and Sanitization
 

Recently uploaded

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Testing of hypothesis

  • 1. Data Analysis Course Testing of Hypothesis Venkat Reddy
  • 2. Data Analysis Course • Data analysis design document • Introduction to statistical data analysis • Descriptive statistics • Data exploration, validation & sanitization • Probability distributions examples and applications Venkat Reddy Data Analysis Course • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 2 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy 3
  • 4. Contents • What is the need of testing • Recap of sampling distribution • Hypothesis testing • Five main steps in testing • Assumptions • Hypotheses Venkat Reddy Data Analysis Course • Test Statistic • P-value (P) • Conclusion: • Testing example • Types of errors • Testing for Means • Z test • T test • Testing for Proportions • Test of independence 4
  • 5. Inference • A cake, (weighs 20 Kg) how do you decide about its taste? A piece of cake or after eating completely • A truck half filled with oranges rest with apples. How would you verify that? By counting them? • Apple & Samsung sell mobiles all over the world. If you want to find Venkat Reddy Data Analysis Course which one people prefer, Do you take the opinion poll from all the users around the world? • Product manager claims that girls are buying their product more than boys? Is there any association between gender and buying or not buying? 5
  • 6. Inference • A cake, (weighs 20 Kg) how do you decide about its taste? A piece of cake or after eating completely • You took a piece of cake, you are not completely satisfied with the taste –Would you recommend the cake? • You took a 10 gram piece of cake, you didn’t like the taste—What would you say? • You took a 1 milligram piece of cake, you liked the taste—What would you say? Venkat Reddy Data Analysis Course • A truck half filled with oranges rest with apples (owner claims 10,000 oranges & 10,000 apples) . How would you verify that? By counting them? • You randomly picked 200 fruits, 120 are apples and 80 are oranges. What would you say? • You randomly picked 200 fruits, 190 are apples and 10s are oranges. What would you say? How bad in the sample is bad enough to say the population is also bad 6
  • 7. Statistical Inference • Inferences about a population are made on the basis of results obtained from a sample drawn from that population • Want to talk about the larger population from which the subjects are drawn, not the particular subjects! Venkat Reddy Data Analysis Course • A hypothesis test is a process that uses sample statistics to test a claim about the value of a population parameter. • A verbal statement, or claim, about a population parameter is called a statistical hypothesis. • Hypothesis: Proportion of apples = 0.5 7
  • 8. Applications of testing • Law and Forensics • Testing for discrimination (in admission, hiring, pay, promotion practices, etc.) – test of proportions • Paternity testing • Testing whether evidence found on a suspect came from the crime scene (blood, fiber, fingerprints, ...) Venkat Reddy Data Analysis Course • Indeed, testing whether the defendant is guilty or not • Medicine and Health • Testing if a new drug is effective or ineffective-test of association • Testing if particulate matter in air pollution causes lung cancer • Testing if a particular gene is responsible for hemophilia • Industry/Business/Economics • Testing if a production machine is ‘in control’ or not-test of means • Testing if a silicon wafer is good for use • Science and Engineering • Testing which theory of gravitation is correct, based on dark matter 8 • Testing whether men and woman differ according to a psychological trait • Testing if two species have a common ancestor
  • 9. Hypothesis Testing – An example CEO of SBI claims employees mean age in SBI bank is 35 (292,215 employees). How can we prove or disprove it? 1. Take a random sample (500) and find their age 2. If it is near 35 then we say there is no evidence to Venkat Reddy Data Analysis Course reject that hypothesis 3. What if sample average age is lower or higher than 35 ? 4. How far is really far? We want to quantify the severity of deviation…… 5. Lets find the probability of this occurrence 6. It if it is really low, then we say we reject null hypothesis 9
  • 10. Hypothesis testing process Assume the population mean age is 35. (Null Hypothesis) Venkat Reddy Data Analysis Course Population 100,000 The Sample Is X  40   35? Mean Is 40 No, not likely! REJECT Null Hypothesis Sample 10 500
  • 11. Reason for Rejecting H0 Sampling Distribution ... Therefore, we reject the null hypothesis Venkat Reddy Data Analysis Course that  = 35. ... if in fact this were the population mean.  = 35 40 It is unlikely that we would get a 11 H0 sample mean of this value ...
  • 12. Sampling distribution -Recap • A sampling distribution is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. • If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample means. Venkat Reddy Data Analysis Course Sample Sample Sample Sample Sample Sample The sampling distribution consists of the values of the sample 12 means,
  • 13. Central Limit theorem -Recap If a sample n (30) is taken from a population with any type distribution that has a mean = and standard deviation = the sample means will have a normal distribution and standard deviation Venkat Reddy Data Analysis Course 13 x
  • 14. Five Step in Testing of Hypothesis 1. Make Assumptions and meet test requirements. 2. State the null hypothesis. Venkat Reddy Data Analysis Course 3. Select the sampling distribution and establish the critical region. 4. Compute the test statistic. 5. Make a decision and interpret results. 14
  • 15. Step 1: Make Assumptions and Meet Test Requirements • Random sampling • Hypothesis testing assumes samples were selected using random sampling. • In this case, the sample of 500 cases was randomly selected from all major branches. Venkat Reddy Data Analysis Course • Level of Measurement is Interval-Ratio. • Yes age is not a categorical variable • Sampling Distribution is normal in shape. • What is the sampling distribution of age? • This is a “large” sample (n≥100). 15
  • 16. Step 2 State the Null Hypothesis • H0: μ = 35 • In other words, Ho: No difference between the sample mean and the population parameter • In other words, The sample mean of 40 is really the same as Venkat Reddy Data Analysis Course the population mean of 35 – the difference is not real but is due to chance. • In other words, The sample of 500 comes from a population that has average age of 35 • In other words, The difference between 35 and 40 is trivial and caused by random chance. 16
  • 17. Step 2 (cont.) State the Alternate Hypothesis • H1: μ≠35 • Or H1: There is a difference between the sample mean and the population parameter • Or The sample of 500 comes a population that does not have Venkat Reddy Data Analysis Course average age 35 In reality, it comes from a different population. • Or The difference between 40 and 35 reflects an actual difference • Or the average age of the population is more than 35 17
  • 18. Step 3 Select Sampling Distribution and Establish the Critical Region • What is the sampling distribution of population mean? • What is alpha? • Probability of rejecting H0 when it is true • α is the indicator of “rare” events. • Any difference with a probability less than α is rare and will cause us Venkat Reddy Data Analysis Course to reject the H0. • We started with H0 as true, we still want to reject the null if the statistic is beyond a certain value, • We already know about some unlikely values of test statistic when null hypothesis is true • for example if the average age of the sample is 60, we definitely want to reject null Details later 18
  • 19. Step 4: Use Formula to Compute the Test Statistic - Z for large samples (≥ 100) • We got the sample average as 40, the age according to null hypothesis is 35, there is a difference of 5, is it due to chance? • How bad is this difference of 5?  Z Venkat Reddy Data Analysis Course  N When the Population σ is not known, use the following formula:  40  35 Z Z s n 1 7.86 500  1 19
  • 20. Step 5 Make a Decision and Interpret Results • The obtained Z score fell in the Critical Region, so we reject the H0. • If the H0 were true, a sample outcome of 14 would be unlikely. • Therefore, the H0 is false and must be rejected. a Venkat Reddy Data Analysis Course H0:   35 H1:  > 35 P-Value 0 What does z of 14 mean? The probability of z being more than 14 is less than 0.000000001 • It is like getting more than 25 heads in a row when you toss a coin • It is like drawing the same card more than 6 times from a shuffled deck • If the average age of 40 is just by chance then compare that chance with above 20 examples
  • 21. What is P-Value • If the observed statistic happens to be just a chance, p-values tells us what is the probability of that chance • The P-value answer the question: What is the probability of the observed test statistic or one more extreme when H0 is true? • Given H0, probability of the current value or extreme than this Venkat Reddy Data Analysis Course • Given H0 is true, probability of obtaining a result as extreme or more extreme than the actual sample • The observed significance level, or p-value of a test of hypothesis is the probability of obtaining the observed value of the sample statistic, or one which is even more supportive of the alternative hypothesis, under the assumption that the null hypothesis is true. • Smallest α the observed sample would reject H0 21
  • 22. P -value • If the alternative hypothesis contains the greater-than symbol (>), the hypothesis test is a right-tailed test. H0: μ =k Ha: μ > k Venkat Reddy Data Analysis Course P is the area to the right of the test statistic. z -3 -2 -1 0 1 2 3 Test 22 statistic
  • 23. One tail & two tailed tests • Two-tailed Test • If the alternative hypothesis contains the not-equal-to symbol (), the hypothesis test is a two-tailed test. In a two-tailed test, each tail has an area of 0.5P. • H0: μ = k • Ha: μ  k Venkat Reddy Data Analysis Course • Left-tailed Test • If the alternative hypothesis contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test. • H0: μ  k • Ha: μ < k • Right-tailed Test • If the alternative hypothesis contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test. • H0: μ  k 23 • Ha: μ < k
  • 24. Types of errors • No matter which hypothesis represents the claim, always begin the hypothesis test assuming that the null hypothesis is true. • At the end of the test, one of two decisions will be made: Venkat Reddy Data Analysis Course • reject the null hypothesis, or • fail to reject the null hypothesis. • A type I error occurs if the null hypothesis is rejected when it is true. • A type II error occurs if the null hypothesis is not rejected when it is false. 24
  • 25. Error Types • Type I Error: Reject H0 when it is true • Type II Error: Do not reject H0 when it is false Test Result – Don’t Reject Venkat Reddy Data Analysis Course Reject H0 H0 Reality H0 True Type I Error Correct H0 False Correct Type II Error 25
  • 26. Level of Significance a • Defines Unlikely Values of Sample Statistic if Null Hypothesis Is True • Called Rejection Region of Sampling Distribution • Designated a (alpha) Venkat Reddy Data Analysis Course • Typical values are 0.01, 0.05, 0.10 • Selected by the Researcher at the Start Provides the Critical Value(s) of the Test • P(Type I error) • Think of analogy with SBI average age 26
  • 27. Level of Significance, a and the Rejection Region H0:   35 a Critical Value(s) H1:  < 35 0 Venkat Reddy Data Analysis Course Rejection Regions a H0:   35 H1:  > 35 0 a/2 H0:   35 H1:   35 0 27
  • 28. Power of test 1-b • P(Type II error) is b • P(Type II error) = b depends on the true value of the parameter (from the range of values in Ha ). • The farther the true parameter value falls from the null value, Venkat Reddy Data Analysis Course the easier it is to reject null, and P(Type II error) goes down. • Power of test = 1 - b = P(reject null, given it is false) • In practice, you want a large enough n for your study so that P(Type II error) is small for the size of effect you expect. 28
  • 29. Which error is bad? • False negative • Miss what could be important • Testing a metal whether it is gold or not • Are these samples going to be looked at again? • False positive Venkat Reddy Data Analysis Course • Waste resources following dead ends • Test whether a drug is deadly or not 29
  • 30. Confidence Intervals • Hypothesis testing focuses on where the sample mean is located • Confidence intervals focus on plausible values for the population mean • General Formula (1-α)% CI for μ Venkat Reddy Data Analysis Course  Z1a / 2 Z1a / 2  X  ,X    n n  • Construct an interval around the point estimate • Look to see if the population/null mean is inside 30
  • 31. Significance Test for Mean  For large samples Z  N y  0 For small samples t  where se  s / n se Venkat Reddy Data Analysis Course • Sampling distribution for small samples is t • The curve of the t distribution varies with sample size (the smaller the size, the flatter the curve) • In using the t-table, we use “degrees of freedom” based on the sample size. • For a one-sample test, df = n – 1. • When looking at the table, find the t-value for the appropriate df = n-1. This will be the cutoff point for your critical region. 31
  • 32. Lab: Significance Test for Mean • It is known that the mean cholesterol level for the nation is 190. We test 100 only children and find that the sample average cholesterol level is 198 and suppose we know the population standard deviation  = 15. does that signify that only children have an average higher cholesterol level than the national average? • Given this sample what are the reasonable values for population mean? Venkat Reddy Data Analysis Course • 50 smokers were questioned about the number of hours they sleep each day. We want to test the hypothesis that the smokers need less sleep than the general public which needs an average of 7.7 hours of sleep. If the sample mean is 7.5 and the population standard deviation is 0.5, what can you conclude? • Given this sample what are the 95% confidence limits for population mean? 32
  • 33. Test of Proportion • Assumptions: • Categorical variable • Randomization • Large sample (but two-sided ok for nearly all n) Venkat Reddy Data Analysis Course • Hypotheses: • Null hypothesis: H0: p  p0 • Alternative hypothesis: Ha: p  p0 (2-sided) • Ha: p > p0 Ha: p < p0 (1-sided) • Set up hypotheses before getting the data • Test statistic: p p0 ˆ p p0 ˆ z  33  p 0 (1  p 0 ) / n pˆ
  • 34. Lab: Test of Proportion • Suppose a coin toss turns up 12 heads out of 20 trials. At .05 significance level, can one reject the null hypothesis that the coin toss is fair? • Suppose that you interview 1000 exiting voters about who they voted for PM. Of the 1000 voters, 550 reported that they Venkat Reddy Data Analysis Course voted for Rahul Gandhi. Is there sufficient evidence to suggest that Rahul Gandhi will win the election at the .01 level? 34
  • 35. Chi square test for Independence • Chi square test of independence • Is happiness independent of family income? A sample of 2955 families are studied Venkat Reddy Data Analysis Course Happiness Very Pretty Not too Total Income Above 272 294 49 615 Average 454 835 131 1420 Below 185 527 208 920 35
  • 36. Chi-square statistic • Summarize closeness of {fo} and {fe} by ( fo  fe )2 2   fe Venkat Reddy Data Analysis Course where sum is taken over all cells in the table. • When H0 is true, sampling distribution of this statistic is approximately (for large n) the chi-squared probability distribution. 36
  • 37. Chi-square calculation • In happiness and family income ( f o  f e )2 (272  189.6)2   2   ...  172.3 fe 189.6 Venkat Reddy Data Analysis Course • df = (3 – 1)(3 – 1) = 4. P-value = 0.000 (rounded, often reported as P < 0.001). Chi-squared percentile values for various right-tail probabilities are in table on text p. 594. • There is very strong evidence against H0: independence (If H0 were true, prob. would be < 0.001 of getting this large a 2 test statistic or even larger). • For significance level a = 0.05 (or a = 0.01 or a = 0.001), we reject H0 and conclude that an association exists between happiness and income. 37
  • 38. Lab Chi-square distribution • Suppose that 125 children are shown three television commercials for breakfast cereal and are asked to pick which they liked best. The results are shown in table below. You would like to know if the choice of favorite commercial was related to whether the child was a boy or a girl or if these two variables are independent Venkat Reddy Data Analysis Course A B C Totals Boys 30 29 16 75 Girls 12 33 5 50 Totals 42 62 21 125 • Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would show increased heart rates compared to those that did not receive the drug. You conduct the study and collect the following data. Heart Rate Increased No Heart Rate Increase Total 38 Treated 36 14 50 Not treated 30 25 55 Total 66 39 105
  • 39. Further Reading • Test of samples means for two populations • Test of sample proportions for two populations • Odds ration for test of association Venkat Reddy Data Analysis Course 39
  • 40. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 21.venkat@gmail.com Venkat Reddy Data Analysis Course +91 9886 768879 40