SlideShare a Scribd company logo
1 of 37
Best Practices in Equivalence
           Testing

          John Castura
        Compusense Inc.



                10th Sensometrics, Rotterdam, July 2010
10th Sensometrics, Rotterdam, July 2010
10th Sensometrics, Rotterdam, July 2010
10th Sensometrics, Rotterdam, July 2010
Equivalence Testing – Purposes

• Reformulation
   • e.g. Ingredient substitution

• Research and Development
   • e.g. Product matching

•Claims Substantiation
    • e.g. Detergent X cleans equivalently to the leading brand




                                    10th Sensometrics, Rotterdam, July 2010
ASTM E1958–07 Standard Guide for Sensory
Claim Substantiation

   Comparative
      Superiority
      Parity Equality / Equivalence
              Unsurpassed / Non-inferiority
   Non-Comparative




                           10th Sensometrics, Rotterdam, July 2010
ASTM

E1885-04 Standard Test Method for Sensory Analysis
         - Triangle Test
E1958-08 Standard Guide for Sensory Claim Substantiation
E2139-05 Standard Test Method for Same-Different Test
E2164-08 Standard Test Method for Directional Difference Test
E2610-08 Standard Test Method for Sensory Analysis
         - Duo-Trio Test




                                10th Sensometrics, Rotterdam, July 2010
ISO

ISO 4120:2004 Sensory Analysis - Methodology - Triangle Test
ISO 5495:2005 Sensory Analysis - Methodology - Paired
               Comparison Test
ISO 10399:2004 Sensory Analysis - Methodology - Duo-Trio
               Test




                               10th Sensometrics, Rotterdam, July 2010
Equivalence Testing - Background

In statistical hypothesis testing usually we have a distribution
under H0. The probability of observing a result in the tail
regions is low if H0 is true. This gives evidence to reject H0 at
the tails of the distribution.

How would a proper hypothesis test for equivalence be
constructed?
   H0: Products not equivalent
   H1: Products equivalent

What is the rejection region?

                                   10th Sensometrics, Rotterdam, July 2010
Equivalence Testing - Background
Consider the difference between two products evaluated for a
sensory attribute by line scale.
Typically we reject H0 in favour of H1 at the tails of the
distribution, which are improbable under H0.

H1 (equivalence)                 H1: Products equivalent
falls in the
center of the
distribution,
not at the tails.

How do we reject H0 in favour of H1?

                                10th Sensometrics, Rotterdam, July 2010
Power Approach

With the Power Approach, the difference hypothesis test is
re-applied to address the equivalence scenario:

   H0: Products not different
   H1: Products different




Shift focus now to sensory difference testing methodologies...


                                 10th Sensometrics, Rotterdam, July 2010
Equivalence Testing – Power Approach

                                    Truth
                        Different              Not

            Reject H0     Correct         Type I Error
                                               α
 Decision
            Retain H0   Type II Error        Correct
                             β


                          10th Sensometrics, Rotterdam, July 2010
Power Approach

With the Power Approach, power calculations are made to
determine an appropriate sample size.

The idea is to ensure that Type II error is improbable.
β is set at some low value.
Power (1-β) is high.




                                  10th Sensometrics, Rotterdam, July 2010
Power Approach

In hypothesis testing the research hypothesis is the
alternative hypothesis (H1), not the null hypothesis (H0).

Insufficient evidence to reject H0 means that it is retained.
It is not “proven” or “accepted”.

Neither p=0.86, nor p=0.06, nor any other p-value “proves”
H0.

The hypothesis test logic has been contorted to meet the
objectives.

                                  10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04

From Jian Bi’s publication “Similarity testing in sensory and
consumer research” (2005, FQ&P):

Select α=0.1 and β=0.05
Assumed proportion of detectors: pd=0.3
Proportion of correct responses:
              pc = pd + (1/3)(1-pd) = 0.533

Use “E-1885 04 Standard Test Method for Sensory Analysis –
Triangle Test” to determine the number of assessors.


                                  10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04




                                   54




                       10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04

Assume the following is true: the products are more similar
than we expected.

Proportion of detectors: pd=0.1
Proportion correct responses:
              pc = pd + (1/3)(1-pd) = 0.1+0.3 = 0.4

If the power approach works we would expect to confirm
similarity with high probability.



                                 10th Sensometrics, Rotterdam, July 2010
Triangle Critical Value - ASTM E1885-04

Retain H0 when the number of correct responses is less than
the number given in Table A1.2.

Standard indicates that values not in the table can be
obtained from normal approximation
xcrit = (n/3) + zα √ 2n/9




                                 10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04


     n=54



 Simulated data drawn from a population with a known
 proportion of detectors.




                              10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04




                        5000 sets



H0 is retained in some sets and rejected in others.
The power approach confirms similarity with probability 0.49.

                                 10th Sensometrics, Rotterdam, July 2010
Triangle Data Simulations - ASTM E1885-04

Table 1 in E1885-04 recommends a minimum of 457
assessors at α=0.1, β=0.05, pd=0.1.

Bi lets n=540 and re-runs the simulation to obtain 5000 sets.

H0 is retained in some sets and rejected in some others.
The power approach confirms similarity with probability 0.02.

This is not good.



                                 10th Sensometrics, Rotterdam, July 2010
Triangle Critical Value - ASTM E1885-04

As n becomes large standard error gets small (√p(1-p)/n ).
Probability of confirming similarity decreases.

Increased precision
      = increased probability of conclusion of difference
      = decreased probability of confirming similarity

Increasing n can be problematic.
In practice n is often increased to balance serving orders.



                                  10th Sensometrics, Rotterdam, July 2010
Equivalence Testing – Rejection Regions

                                 Retain H0
                                                 TOST
                                                                 Power
 s




                                                                 Approach


                       Reject H0 (equivalence)
Relationship between variance and rejection regions due to power approach (blue)
and TOST (red) in an equivalence test with two treatments for a bioavailability
variable (adapted from Schuirrmann, 1987). A similar issue exists with the power
approach involving binomial data (where rejection region will follow a step function).

                                             10th Sensometrics, Rotterdam, July 2010
Triangle Critical Value - ISO 4120:2004

ISO standard 4120:2004 also provides guidance for the
Triangle test. Selection of n follows the same procedure as
ASTM E1885-04.

ISO 4120:2004 provides a table and formula for maximum
correct responses for similarity testing significance:
       xcrit = { x | pd = (1.5(x/n)-0.5) + 1.5 zβ √ (nx-x2)/n3 }




                                  10th Sensometrics, Rotterdam, July 2010
Triangle Critical Values - ISO vs. ASTM

ISO tests whether CIupper<pd
pd is defined by the researcher.

ASTM tests whether the CI includes zero.
pd is defined by the researcher.
A CI within (0,pd) is not similar – zero must be included.

Using CI equations in E1885-04 Appendix X4 it is possible to
make decisions following the ISO guidelines.



                                   10th Sensometrics, Rotterdam, July 2010
Triangle Simulation for 3 methods

Set α=β=0.05 and pd=30%. Use n=66.

Let n={66, 660} and
pd = {40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 0%}.
2500 simulated datasets for each of the 18 scenarios.

Determine percentage of times that similarity is confirmed
according to methods provided in…
      (i) ASTM E1885-04
      (ii) ISO 4120:2004
      (iii) sensR::discrim()

                                 10th Sensometrics, Rotterdam, July 2010
Triangle Simulation for 3 methods

Percentages with which similarity confirmed when n=66
 Method             pd=40%    pd=35%     pd=30%       pd=25%    pd=20%
 ASTM E1885-04         0.20      1.36          5.16     13.20      28.00
 ISO 4120:2004         0.20      1.36          5.16     13.20      28.00
 sensR::discrim()      0.20      1.36          5.16     13.20      28.00

 Method             pd=15%    pd=10%     pd=5%        pd=0%
 ASTM E1885-04        50.36     71.08        85.64      95.12
 ISO 4120:2004        50.36     71.08        85.64      95.12
 sensR::discrim()     50.36     71.08        85.64      95.12



                                        10th Sensometrics, Rotterdam, July 2010
Triangle Simulation for 3 methods

Percentages with which similarity confirmed when n=660
 Method             pd=40%    pd=35%     pd=30%       pd=25%    pd=20%
 ASTM E1885-04         0.00      0.00          0.00      0.00       0.00
 ISO 4120:2004         0.00      0.04          8.92     64.68      97.80
 sensR::discrim()      0.00      0.00          5.00     51.12      95.64

 Method             pd=15%    pd=10%     pd=5%        pd=0%
 ASTM E1885-04         0.04      2.48        41.24      94.80
 ISO 4120:2004       100.00    100.00       100.00     100.00
 sensR::discrim()    100.00    100.00       100.00     100.00



                                        10th Sensometrics, Rotterdam, July 2010
Replicated Triangle Tests

Both ISO 4120:2004 and E1885-04 discourage the reader
from using replicated triangle tests.

Vague wording in E1885-04 suggests that such an analysis is
possible, but none is referenced.




                                10th Sensometrics, Rotterdam, July 2010
Similarity Testing – Test Statistic

Detection shown experimentally to be stochastic, not
deterministic (Ennis, 1993).

Binomial model still applies if assessors have identical
detection abilities. But assessor variance means that test
statistic follows different statistical distributions in H0 and H1.

Bi (2001) notes that it is incorrect for H0 and H1 to follow
different distributions – power and sample size calculations
based on the binomial are invalid when this principle is
violated.

                                    10th Sensometrics, Rotterdam, July 2010
Duo-Trio Simulation for 3 methods

Set α=β=0.05 and pd=30%. Use n=119.

Let n={119, 1190} and
pd = {40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 0%}.
2500 simulated datasets for each of the 18 scenarios.

Determine percentage of times that similarity is confirmed
according to methods provided in…
      (i) ASTM E2610-08
      (ii) ISO 10399:2004
      (iii) sensR::discrim()

                                 10th Sensometrics, Rotterdam, July 2010
Duo-Trio Simulation for 3 methods

Percentages with which similarity confirmed when n=119
 Method             pd=40%    pd=35%     pd=30%       pd=25%    pd=20%
 ASTM E2610-08         0.20      1.16          4.48     13.64      29.60
 ISO 10399:2004        0.20      1.16          4.48     13.64      29.60
 sensR::discrim()      0.20      1.16          4.48     13.64      29.60

 Method             pd=15%    pd=10%     pd=5%        pd=0%
 ASTM E2610-08         50.8     72.32          86.6     94.76
 ISO 10399:2004        50.8     72.32          86.6     94.76
 sensR::discrim()      50.8     72.32          86.6     94.76



                                        10th Sensometrics, Rotterdam, July 2010
Duo-Trio Simulation for 3 methods

Percentages with which similarity confirmed when n=1190
 Method             pd=40%    pd=35%     pd=30%       pd=25%    pd=20%
 ASTM E2610-08         0.00      0.00          0.00      0.00       0.00
 ISO 10399:2004        0.00      0.04          5.24     56.28      97.44
 sensR::discrim()      0.00      0.04          4.60     54.84      97.16

 Method             pd=15%    pd=10%     pd=5%        pd=0%
 ASTM E2610-08         0.00      3.68        46.80      95.68
 ISO 10399:2004      100.00    100.00       100.00     100.00
 sensR::discrim()    100.00    100.00       100.00     100.00



                                        10th Sensometrics, Rotterdam, July 2010
Equivalence Testing – Binomial Exact Solution

Equivalence Testing – null and alternative hypotheses
   H0: Products not equivalent
   H1: Products equivalent




Exact binomial solution from Ennis & Ennis (2008).




                                 10th Sensometrics, Rotterdam, July 2010
Equivalence Testing – Binomial Exact Solution

Equivalence and unsurpassed advertising claims using 2-AFC
addressed in “Tables for Parity Testing” (Ennis, 2008).
   H0: (p-0.5)2 ≥ 0.052
   H1: (p-0.5)2 < 0.052
Bounds defining equivalence are 45% and 55%, and true
choice probability is p.

Table values based on normal approximation given in
E1958-07 Standard Guide for Sensory Claim Substantiation.

Binomial Exact Test more limited in application than TOST.

                                 10th Sensometrics, Rotterdam, July 2010
Some key points…

•   Confidence intervals are much preferable to hypothesis test
    decision
•   Increasing n can have unintended consequences if
    following ASTM standards!
•   Power approach contorts hypothesis test logic
•   So far we are talking about equivalence of population
    average, not individual equivalence




                                 10th Sensometrics, Rotterdam, July 2010
Some key points…

•   Assumption that all assessors have same detection
    probability is not believable
•   Assumption that each assessor is either non-detector or
    detector is not believable
•   Some interest in the beta-binomial
•   Choose the best methods for the purpose
•   Assessor selection and test procedure very important
•   What do we really want to know?




                                 10th Sensometrics, Rotterdam, July 2010

More Related Content

Similar to Best Practices in Equivalence Testing

Lecture 05 - Statistical Analysis of Experimental Data.pptx
Lecture 05 - Statistical Analysis of Experimental Data.pptxLecture 05 - Statistical Analysis of Experimental Data.pptx
Lecture 05 - Statistical Analysis of Experimental Data.pptxAdelAbdelmaboud
 
statistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptstatistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptHalilIbrahimUlusoy
 
STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884Liang Kai Hu
 
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-testHypothesis Test _One-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-testRavindra Nath Shukla
 
Hypothesis Testing (Statistical Significance)1Hypo.docx
Hypothesis Testing (Statistical Significance)1Hypo.docxHypothesis Testing (Statistical Significance)1Hypo.docx
Hypothesis Testing (Statistical Significance)1Hypo.docxadampcarr67227
 
Introduction to the guide of uncertainty in measurement
Introduction to the guide of uncertainty in measurementIntroduction to the guide of uncertainty in measurement
Introduction to the guide of uncertainty in measurementMaurice Maeck
 
Data Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToData Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToDwayne Neal
 
chapter_8_20162.pdf
chapter_8_20162.pdfchapter_8_20162.pdf
chapter_8_20162.pdfSumitRijal1
 
Hypothesis Testing with ease
Hypothesis Testing with easeHypothesis Testing with ease
Hypothesis Testing with easeRupak Roy
 
1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inferenceDev Pandey
 
12 13 h2_measurement_ppt
12 13 h2_measurement_ppt12 13 h2_measurement_ppt
12 13 h2_measurement_pptTan Hong
 
Testing a claim about a standard deviation or variance
Testing a claim about a standard deviation or variance  Testing a claim about a standard deviation or variance
Testing a claim about a standard deviation or variance Long Beach City College
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce
 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture FinalNBER
 
Chapter 8 Section 2.ppt
Chapter 8 Section 2.pptChapter 8 Section 2.ppt
Chapter 8 Section 2.pptManoloTaquire
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONsipij
 

Similar to Best Practices in Equivalence Testing (20)

Lecture 05 - Statistical Analysis of Experimental Data.pptx
Lecture 05 - Statistical Analysis of Experimental Data.pptxLecture 05 - Statistical Analysis of Experimental Data.pptx
Lecture 05 - Statistical Analysis of Experimental Data.pptx
 
statistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).pptstatistics-for-analytical-chemistry (1).ppt
statistics-for-analytical-chemistry (1).ppt
 
STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884
 
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-testHypothesis Test _One-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _One-sample t-test, Z-test, Proportion Z-test
 
Hypothesis Testing (Statistical Significance)1Hypo.docx
Hypothesis Testing (Statistical Significance)1Hypo.docxHypothesis Testing (Statistical Significance)1Hypo.docx
Hypothesis Testing (Statistical Significance)1Hypo.docx
 
Davezies
DaveziesDavezies
Davezies
 
Introduction to the guide of uncertainty in measurement
Introduction to the guide of uncertainty in measurementIntroduction to the guide of uncertainty in measurement
Introduction to the guide of uncertainty in measurement
 
Data Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToData Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer To
 
chapter_8_20162.pdf
chapter_8_20162.pdfchapter_8_20162.pdf
chapter_8_20162.pdf
 
Hypothesis Testing with ease
Hypothesis Testing with easeHypothesis Testing with ease
Hypothesis Testing with ease
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference
 
12 13 h2_measurement_ppt
12 13 h2_measurement_ppt12 13 h2_measurement_ppt
12 13 h2_measurement_ppt
 
Testing a claim about a standard deviation or variance
Testing a claim about a standard deviation or variance  Testing a claim about a standard deviation or variance
Testing a claim about a standard deviation or variance
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture Final
 
Chapter 8 Section 2.ppt
Chapter 8 Section 2.pptChapter 8 Section 2.ppt
Chapter 8 Section 2.ppt
 
phd thesis presentation
phd thesis presentationphd thesis presentation
phd thesis presentation
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
 

More from Compusense Inc.

Global Collaboration Case Study
Global Collaboration Case StudyGlobal Collaboration Case Study
Global Collaboration Case StudyCompusense Inc.
 
Packaging and Concept Testing Case Study
Packaging and Concept Testing Case StudyPackaging and Concept Testing Case Study
Packaging and Concept Testing Case StudyCompusense Inc.
 
Instore-testing-case-study
Instore-testing-case-studyInstore-testing-case-study
Instore-testing-case-studyCompusense Inc.
 
Central location testing
Central location testingCentral location testing
Central location testingCompusense Inc.
 
Panel Recruitment and Scheduling Case Study
Panel Recruitment and Scheduling Case StudyPanel Recruitment and Scheduling Case Study
Panel Recruitment and Scheduling Case StudyCompusense Inc.
 
Consumer Driven Product Development for Export Markets
Consumer Driven Product Development for Export MarketsConsumer Driven Product Development for Export Markets
Consumer Driven Product Development for Export MarketsCompusense Inc.
 
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...A Preliminary Review of Multiple Group Principal Component Analysis for Descr...
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...Compusense Inc.
 
Panel Tracking via Thurstonian Modeling
Panel Tracking via Thurstonian ModelingPanel Tracking via Thurstonian Modeling
Panel Tracking via Thurstonian ModelingCompusense Inc.
 
Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Compusense Inc.
 
The power of calibrated descriptive sensory panels
The power of calibrated descriptive sensory panelsThe power of calibrated descriptive sensory panels
The power of calibrated descriptive sensory panelsCompusense Inc.
 
Sensory Considerations in BIB Design
Sensory Considerations in BIB DesignSensory Considerations in BIB Design
Sensory Considerations in BIB DesignCompusense Inc.
 
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon Wines
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon WinesSegmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon Wines
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon WinesCompusense Inc.
 

More from Compusense Inc. (12)

Global Collaboration Case Study
Global Collaboration Case StudyGlobal Collaboration Case Study
Global Collaboration Case Study
 
Packaging and Concept Testing Case Study
Packaging and Concept Testing Case StudyPackaging and Concept Testing Case Study
Packaging and Concept Testing Case Study
 
Instore-testing-case-study
Instore-testing-case-studyInstore-testing-case-study
Instore-testing-case-study
 
Central location testing
Central location testingCentral location testing
Central location testing
 
Panel Recruitment and Scheduling Case Study
Panel Recruitment and Scheduling Case StudyPanel Recruitment and Scheduling Case Study
Panel Recruitment and Scheduling Case Study
 
Consumer Driven Product Development for Export Markets
Consumer Driven Product Development for Export MarketsConsumer Driven Product Development for Export Markets
Consumer Driven Product Development for Export Markets
 
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...A Preliminary Review of Multiple Group Principal Component Analysis for Descr...
A Preliminary Review of Multiple Group Principal Component Analysis for Descr...
 
Panel Tracking via Thurstonian Modeling
Panel Tracking via Thurstonian ModelingPanel Tracking via Thurstonian Modeling
Panel Tracking via Thurstonian Modeling
 
Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...
 
The power of calibrated descriptive sensory panels
The power of calibrated descriptive sensory panelsThe power of calibrated descriptive sensory panels
The power of calibrated descriptive sensory panels
 
Sensory Considerations in BIB Design
Sensory Considerations in BIB DesignSensory Considerations in BIB Design
Sensory Considerations in BIB Design
 
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon Wines
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon WinesSegmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon Wines
Segmentation of BIB Consumer Liking Data of 12 Cabernet Sauvignon Wines
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Best Practices in Equivalence Testing

  • 1. Best Practices in Equivalence Testing John Castura Compusense Inc. 10th Sensometrics, Rotterdam, July 2010
  • 5. Equivalence Testing – Purposes • Reformulation • e.g. Ingredient substitution • Research and Development • e.g. Product matching •Claims Substantiation • e.g. Detergent X cleans equivalently to the leading brand 10th Sensometrics, Rotterdam, July 2010
  • 6. ASTM E1958–07 Standard Guide for Sensory Claim Substantiation Comparative Superiority Parity Equality / Equivalence Unsurpassed / Non-inferiority Non-Comparative 10th Sensometrics, Rotterdam, July 2010
  • 7. ASTM E1885-04 Standard Test Method for Sensory Analysis - Triangle Test E1958-08 Standard Guide for Sensory Claim Substantiation E2139-05 Standard Test Method for Same-Different Test E2164-08 Standard Test Method for Directional Difference Test E2610-08 Standard Test Method for Sensory Analysis - Duo-Trio Test 10th Sensometrics, Rotterdam, July 2010
  • 8. ISO ISO 4120:2004 Sensory Analysis - Methodology - Triangle Test ISO 5495:2005 Sensory Analysis - Methodology - Paired Comparison Test ISO 10399:2004 Sensory Analysis - Methodology - Duo-Trio Test 10th Sensometrics, Rotterdam, July 2010
  • 9. Equivalence Testing - Background In statistical hypothesis testing usually we have a distribution under H0. The probability of observing a result in the tail regions is low if H0 is true. This gives evidence to reject H0 at the tails of the distribution. How would a proper hypothesis test for equivalence be constructed? H0: Products not equivalent H1: Products equivalent What is the rejection region? 10th Sensometrics, Rotterdam, July 2010
  • 10. Equivalence Testing - Background Consider the difference between two products evaluated for a sensory attribute by line scale. Typically we reject H0 in favour of H1 at the tails of the distribution, which are improbable under H0. H1 (equivalence) H1: Products equivalent falls in the center of the distribution, not at the tails. How do we reject H0 in favour of H1? 10th Sensometrics, Rotterdam, July 2010
  • 11. Power Approach With the Power Approach, the difference hypothesis test is re-applied to address the equivalence scenario: H0: Products not different H1: Products different Shift focus now to sensory difference testing methodologies... 10th Sensometrics, Rotterdam, July 2010
  • 12. Equivalence Testing – Power Approach Truth Different Not Reject H0 Correct Type I Error α Decision Retain H0 Type II Error Correct β 10th Sensometrics, Rotterdam, July 2010
  • 13. Power Approach With the Power Approach, power calculations are made to determine an appropriate sample size. The idea is to ensure that Type II error is improbable. β is set at some low value. Power (1-β) is high. 10th Sensometrics, Rotterdam, July 2010
  • 14. Power Approach In hypothesis testing the research hypothesis is the alternative hypothesis (H1), not the null hypothesis (H0). Insufficient evidence to reject H0 means that it is retained. It is not “proven” or “accepted”. Neither p=0.86, nor p=0.06, nor any other p-value “proves” H0. The hypothesis test logic has been contorted to meet the objectives. 10th Sensometrics, Rotterdam, July 2010
  • 15. Triangle Data Simulations - ASTM E1885-04 From Jian Bi’s publication “Similarity testing in sensory and consumer research” (2005, FQ&P): Select α=0.1 and β=0.05 Assumed proportion of detectors: pd=0.3 Proportion of correct responses: pc = pd + (1/3)(1-pd) = 0.533 Use “E-1885 04 Standard Test Method for Sensory Analysis – Triangle Test” to determine the number of assessors. 10th Sensometrics, Rotterdam, July 2010
  • 16. Triangle Data Simulations - ASTM E1885-04 54 10th Sensometrics, Rotterdam, July 2010
  • 17. Triangle Data Simulations - ASTM E1885-04 Assume the following is true: the products are more similar than we expected. Proportion of detectors: pd=0.1 Proportion correct responses: pc = pd + (1/3)(1-pd) = 0.1+0.3 = 0.4 If the power approach works we would expect to confirm similarity with high probability. 10th Sensometrics, Rotterdam, July 2010
  • 18. Triangle Critical Value - ASTM E1885-04 Retain H0 when the number of correct responses is less than the number given in Table A1.2. Standard indicates that values not in the table can be obtained from normal approximation xcrit = (n/3) + zα √ 2n/9 10th Sensometrics, Rotterdam, July 2010
  • 19. Triangle Data Simulations - ASTM E1885-04 n=54 Simulated data drawn from a population with a known proportion of detectors. 10th Sensometrics, Rotterdam, July 2010
  • 20. Triangle Data Simulations - ASTM E1885-04 5000 sets H0 is retained in some sets and rejected in others. The power approach confirms similarity with probability 0.49. 10th Sensometrics, Rotterdam, July 2010
  • 21. Triangle Data Simulations - ASTM E1885-04 Table 1 in E1885-04 recommends a minimum of 457 assessors at α=0.1, β=0.05, pd=0.1. Bi lets n=540 and re-runs the simulation to obtain 5000 sets. H0 is retained in some sets and rejected in some others. The power approach confirms similarity with probability 0.02. This is not good. 10th Sensometrics, Rotterdam, July 2010
  • 22. Triangle Critical Value - ASTM E1885-04 As n becomes large standard error gets small (√p(1-p)/n ). Probability of confirming similarity decreases. Increased precision = increased probability of conclusion of difference = decreased probability of confirming similarity Increasing n can be problematic. In practice n is often increased to balance serving orders. 10th Sensometrics, Rotterdam, July 2010
  • 23. Equivalence Testing – Rejection Regions Retain H0 TOST Power s Approach Reject H0 (equivalence) Relationship between variance and rejection regions due to power approach (blue) and TOST (red) in an equivalence test with two treatments for a bioavailability variable (adapted from Schuirrmann, 1987). A similar issue exists with the power approach involving binomial data (where rejection region will follow a step function). 10th Sensometrics, Rotterdam, July 2010
  • 24. Triangle Critical Value - ISO 4120:2004 ISO standard 4120:2004 also provides guidance for the Triangle test. Selection of n follows the same procedure as ASTM E1885-04. ISO 4120:2004 provides a table and formula for maximum correct responses for similarity testing significance: xcrit = { x | pd = (1.5(x/n)-0.5) + 1.5 zβ √ (nx-x2)/n3 } 10th Sensometrics, Rotterdam, July 2010
  • 25. Triangle Critical Values - ISO vs. ASTM ISO tests whether CIupper<pd pd is defined by the researcher. ASTM tests whether the CI includes zero. pd is defined by the researcher. A CI within (0,pd) is not similar – zero must be included. Using CI equations in E1885-04 Appendix X4 it is possible to make decisions following the ISO guidelines. 10th Sensometrics, Rotterdam, July 2010
  • 26. Triangle Simulation for 3 methods Set α=β=0.05 and pd=30%. Use n=66. Let n={66, 660} and pd = {40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 0%}. 2500 simulated datasets for each of the 18 scenarios. Determine percentage of times that similarity is confirmed according to methods provided in… (i) ASTM E1885-04 (ii) ISO 4120:2004 (iii) sensR::discrim() 10th Sensometrics, Rotterdam, July 2010
  • 27. Triangle Simulation for 3 methods Percentages with which similarity confirmed when n=66 Method pd=40% pd=35% pd=30% pd=25% pd=20% ASTM E1885-04 0.20 1.36 5.16 13.20 28.00 ISO 4120:2004 0.20 1.36 5.16 13.20 28.00 sensR::discrim() 0.20 1.36 5.16 13.20 28.00 Method pd=15% pd=10% pd=5% pd=0% ASTM E1885-04 50.36 71.08 85.64 95.12 ISO 4120:2004 50.36 71.08 85.64 95.12 sensR::discrim() 50.36 71.08 85.64 95.12 10th Sensometrics, Rotterdam, July 2010
  • 28. Triangle Simulation for 3 methods Percentages with which similarity confirmed when n=660 Method pd=40% pd=35% pd=30% pd=25% pd=20% ASTM E1885-04 0.00 0.00 0.00 0.00 0.00 ISO 4120:2004 0.00 0.04 8.92 64.68 97.80 sensR::discrim() 0.00 0.00 5.00 51.12 95.64 Method pd=15% pd=10% pd=5% pd=0% ASTM E1885-04 0.04 2.48 41.24 94.80 ISO 4120:2004 100.00 100.00 100.00 100.00 sensR::discrim() 100.00 100.00 100.00 100.00 10th Sensometrics, Rotterdam, July 2010
  • 29. Replicated Triangle Tests Both ISO 4120:2004 and E1885-04 discourage the reader from using replicated triangle tests. Vague wording in E1885-04 suggests that such an analysis is possible, but none is referenced. 10th Sensometrics, Rotterdam, July 2010
  • 30. Similarity Testing – Test Statistic Detection shown experimentally to be stochastic, not deterministic (Ennis, 1993). Binomial model still applies if assessors have identical detection abilities. But assessor variance means that test statistic follows different statistical distributions in H0 and H1. Bi (2001) notes that it is incorrect for H0 and H1 to follow different distributions – power and sample size calculations based on the binomial are invalid when this principle is violated. 10th Sensometrics, Rotterdam, July 2010
  • 31. Duo-Trio Simulation for 3 methods Set α=β=0.05 and pd=30%. Use n=119. Let n={119, 1190} and pd = {40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 0%}. 2500 simulated datasets for each of the 18 scenarios. Determine percentage of times that similarity is confirmed according to methods provided in… (i) ASTM E2610-08 (ii) ISO 10399:2004 (iii) sensR::discrim() 10th Sensometrics, Rotterdam, July 2010
  • 32. Duo-Trio Simulation for 3 methods Percentages with which similarity confirmed when n=119 Method pd=40% pd=35% pd=30% pd=25% pd=20% ASTM E2610-08 0.20 1.16 4.48 13.64 29.60 ISO 10399:2004 0.20 1.16 4.48 13.64 29.60 sensR::discrim() 0.20 1.16 4.48 13.64 29.60 Method pd=15% pd=10% pd=5% pd=0% ASTM E2610-08 50.8 72.32 86.6 94.76 ISO 10399:2004 50.8 72.32 86.6 94.76 sensR::discrim() 50.8 72.32 86.6 94.76 10th Sensometrics, Rotterdam, July 2010
  • 33. Duo-Trio Simulation for 3 methods Percentages with which similarity confirmed when n=1190 Method pd=40% pd=35% pd=30% pd=25% pd=20% ASTM E2610-08 0.00 0.00 0.00 0.00 0.00 ISO 10399:2004 0.00 0.04 5.24 56.28 97.44 sensR::discrim() 0.00 0.04 4.60 54.84 97.16 Method pd=15% pd=10% pd=5% pd=0% ASTM E2610-08 0.00 3.68 46.80 95.68 ISO 10399:2004 100.00 100.00 100.00 100.00 sensR::discrim() 100.00 100.00 100.00 100.00 10th Sensometrics, Rotterdam, July 2010
  • 34. Equivalence Testing – Binomial Exact Solution Equivalence Testing – null and alternative hypotheses H0: Products not equivalent H1: Products equivalent Exact binomial solution from Ennis & Ennis (2008). 10th Sensometrics, Rotterdam, July 2010
  • 35. Equivalence Testing – Binomial Exact Solution Equivalence and unsurpassed advertising claims using 2-AFC addressed in “Tables for Parity Testing” (Ennis, 2008). H0: (p-0.5)2 ≥ 0.052 H1: (p-0.5)2 < 0.052 Bounds defining equivalence are 45% and 55%, and true choice probability is p. Table values based on normal approximation given in E1958-07 Standard Guide for Sensory Claim Substantiation. Binomial Exact Test more limited in application than TOST. 10th Sensometrics, Rotterdam, July 2010
  • 36. Some key points… • Confidence intervals are much preferable to hypothesis test decision • Increasing n can have unintended consequences if following ASTM standards! • Power approach contorts hypothesis test logic • So far we are talking about equivalence of population average, not individual equivalence 10th Sensometrics, Rotterdam, July 2010
  • 37. Some key points… • Assumption that all assessors have same detection probability is not believable • Assumption that each assessor is either non-detector or detector is not believable • Some interest in the beta-binomial • Choose the best methods for the purpose • Assessor selection and test procedure very important • What do we really want to know? 10th Sensometrics, Rotterdam, July 2010