Hypothesis testing of sample means (flowchart)

On this page we give the flow chart for testing means of independent samples. For instance, the set of temperature measurements over a 10 year period for all days in July is pretty independent of the set of temperature measurements over a 10 year period for all days in January.  An example of non-independent samples is the measurement of cancer tumor size in 100 patients before and after some cancer treatment; the final tumor size will of course be somewhat (or a lot) correlated to the tumor size at the beginning of treatment.

Is your data count data (ie; number of something counted per day, per week, etc), with low number of counts such that the stochasticity in data are not in the Normal regime?

Yes, this is low-count count data

  • You are on the wrong page.  Go here

No, this either is not count data, or it is high-count count data

  • Are you testing if the mean of just one sample is consistent with some value, mu?
  • Yes, I am testing if just one mean is consistent with some value
    • Does the sample have at least N~10 measurements used to calculate the mean (ie; the Central Limit Theorem applies)?
    • Yes, there are at least 10 measurements
      1. Calculate the sample mean, bar(X) and standard error on the mean, SE
      2. Calculate the Z statistic                         
      3. Use pnorm(Z) in R to calculate the p-value.
      4. The p-value tests the null hypothesis that the true mean of the probability distribution underlying the sample is consistent with mu.
      5. Reject the null hypothesis if the p-value is close to 0 or 1 (ie; within 0.05 of 0 or 1)
    • No, there are not at least ~10 measurements
      • Do you think the data are likely consistent with being Normally distributed?
      • Yes, the data are Normally distributed
        1. Calculate the sample mean, bar(X) and standard error on the mean, SE
        2. Calculate the t statistic                    
        3. Calculate number degrees of freedom df=(N-1)
        4. To calculate the p-value, use pt(t,df) in R
        5. The p-value tests the null hypothesis that the true mean of the probability distribution underlying the sample is consistent with mu
        6. Reject the null hypothesis if the p-value is close to 0 or close to 1 (ie; within 0.05 of 0 or 1)
      • No, the data aren’t Normally distributed
        • Beyond the scope of this course because it involves likelihood methods  (but note that, wrong or not, usually people just assume that small samples of data are in fact Normally distributed, and go ahead and use the t-test)
  • No, I’m testing equality of more than one mean
    • Are you testing if means of two samples are consistent with being equal?
    • Yes, I am testing just two means
      • Does each sample have at least N~10 measurements used to calculate the mean (ie; CLT applies?)
      • Yes, there are at least 10 measurements used to calculate both means
        • Calculate the sample mean, bar(X), and standard error on the mean, SE for both samples
        • Calculate the Z statistic  
        • Use pnorm(Z) in R to calculate the p-value
        • The p-value tests the null hypothesis that the true means of the probability distributions underlying the two samples are consistent with being equal
        • Reject the null hypothesis if the p-value is close to 0 or close to 1 (ie; within 0.05 of 0 or 1)
      • No, there are less than ~10 measurements used to calculate the mean of one or both of the samples
        • Are the data likely consistent with being Normally distributed?
        • Yes, the data appear to be Normally distributed
          • Calculate the sample mean, bar(X), and standard error on the mean, SE for both samples
          • Calculate the t statistic
          • Calculate the degrees of freedom of the t statistic
          • Use pt(t,df) in R to calculate the p-value
          • The p-value tests the null hypothesis that the true means of the probability distributions underlying the samples are consistent with being equal
          • Reject the null if the p-value is close to 0 or close to 1 (ie; within 0.05 of 0 or 1)
        • No, the data are not Normally distributed
          • Beyond the scope of this course (involves likelihood methods)
    • No, I am testing the equality of means of more than two samples
      • Does each of the samples have at least 10 measurements (ie; CLT applies)?
      • Yes
        • Calculate the sample mean, bar(X) and standard error on the mean, SE, of all samples.
        • Calculate the overall mean bar(bar(X)) of the combined samples
        • Calculate the chi-square statistic
        • Calculate the number of degrees of freedom df=(Nsample-1)
        • Use pchisq(Q,df) in R to calculate the p-value
        • The p-value tests the hypothesis that the means of the probability distributions underlying each of the samples are consistent with being equal
        • Reject the null hypothesis if the p-value is close to 1 (ie; greater than 0.95)
      • No
        • Beyond the scope of this course (involves likelihood methods)

 

Visits: 2259

Leave a Reply