Environmental Data Analysis BC ENV 3017

Statistics 3

Significance tests

Assume you want to find out if the a list of data is statistically different from a certain value. In this case you need to perform a significance test.

There are 2 ways how to determine if samples are different from each other of different from a certain value, one uses confidence intervals, one uses hypothesis testing as main concept.

Confidence intervals

Let us assume, you want to calibrate an ozone detector. Your standard has an ozone conc. of 70ppb. Your measurements were: 78, 83, 68, 72, 88; average: 77.80, SD = 8.07. Is this significantly different from 70?

One way to address this would be to calculate the confidence interval and see if the expected value falls within the confidence interval. It turns out to be 77.80
±10.02 and includes the 70ppb. Therefore  we cannot say that the instrument has drifted. But of course we could perform more measurements (increase n) and look at the question more carefully.

Another example is to use our use a sample of PM counts obtained at the one place but counted by different students. The samples were collected in the following way.

student1 student2 difference
52 51 1
53 52 1
54 53 1
55 54 1
53 50 3

So, here we want to know if the there is a significant difference between the two students looking at the same sample . We
We can also look at the same results by assuming that the samples were measured by the same student but reflect different samples:

sample1 sample2
52 51
53 52
54 53
55 54
53 50

Now it does not make sense to calculate a difference for each pair (row), because the counts are unrelated.
So in this case we would

Hypothesis testing

Another approach is to use the concept of hypothesis testing to address the questions raised above and perform a formal t-test.

Let us assume, you want to calibrate the ozone detector. Your standard has an ozone conc. of 70ppb. Your measurements were: 78, 83, 68, 72, 88; average: 77.80, SD = 8.07. Is this significantly different from 70? In order to perform this test, we need to formulate the problem as a hypothesis:

The significance test investigates if the null-hypothesis can be rejected. It yields the probability (P-value) for the null-hypothesis being true. In other words, we are testing the following question: What is the probability that we obtained our observed average (which is different from the expected average) by chance?

The t-statistic is used to measure the difference between the data and the expected values in standard units. We are 2.2 SE units off the expected value! What is the probability that we are that far off (or more) from the expected value? We can use the normal curve to estimate this probability in n is large (fig).

Because we deal here with small numbers of measurements, we need to use Student’s t curve, which looks very similar to the normal curve (fig). (Use of the normal distribution is only justified if we have a very large sample size or if we know the true SD (s). In case we can use the normal distribution, the t-statistic mentioned above would be called z-statistics.)

When you use the student curve, you will be asked to give the degrees of freedom:

degrees of freedom = number of measurements -1

In our example, we need to use the two tailed t-test (TDIST). We get a P-value of 0.097 or 9.7%. The cut-off value is typically chosen at 5%. If the probability is below 5%, we need to reject the null-hypothesis. In our case, we cannot reject the null hypothesis. that means there is no significant difference between our observed and expected value. By performing more measurements, we can improve the SE and more precisely check, if our instrument is systematically off.

This test can also be used to investigate if there are systematic differences between experiments, for example between samples obtained at different sites. Depending on certain constraints, you need to use one of the following t-tests that EXCEL offers or use the equivalent Statplus functions:

Usually you do not know if the variances are equal or not. Just use the "unequal" test then. It will give you the right answer also in the case the variances are equal. The best way to understand these cases is to go through examples.

Resources: