Environmental Data Analysis BC
ENV 3017
Statistics 2a
Probability
- relative frequency = Number of times an event
occurs/number
of replications
- probability = relative frequency on an event after an
indefinitely
long number of trials
Probability distributions
- pattern of probabilities for a set of events is called a probability
distribution
- two important elements:
- the probability of each event or combination of events must
range from
0 to 1
- the sum of the probabilities of all possible events must be
equal to 1
- discrete probability distribution => probabilities
associated with
series of discrete outcomes
- e.g. for a six sided die: p(y)
= 1/6, y=1,2,3,4,5,6
- continuous probability distribution => probabilities
associated
with continuous values
- we cannot assign a positive probability to a specific value
- continuous probability distributions are calculated using a
probability
density function
- the probability associated with a range of values is equal to
the area
under the PDF (probability density function) curve
- Distributions tutorial, we'll focus
on the Normal distribution
Random Variables and random samples
- a random variable is a variable whose values occur at
random,
following
a probability distribution
- when a random variable attains a value that value is called an observation
- a collection of observations is called a random sample
- Random sample tutorial (Marksman/target)
- larger samples result in histograms
resembling the PDF more closely
The Normal Distribution
- the distribution used in the above example is the Normal
distribution:
- µ
is the true mean, σ the true standard deviation, x the random
variable
- Probability tutorial => normal
distribution
- last pageof tutorial: find the
probability
between
±1σ and ±2σ.
(~0.68 and 0.95)
- the normal curve is
symmetric
about
the average, and the total area under it is 100% (fig).
- Standard units say
how
many sigma’s
a value is, above (+) or below (-) the average.
- How to normalize the
histogram: x (height in inches) becomes (x-µ)/σ, and
the y-axis changes from %/inch to %/standard unit (fig).
- many histograms have roughly
the
same
shape as the normal curve.
- the main reason why the normal distribution is a good
approximation to
many experiments in the sciences is the Central Limit
Theorem:
"As the sample size, n, increases, the sampling distribution of the
mean
approaches a normal distribution, no matter how the population being
sampled
is distributed".
- if a list of numbers follows
the
normal
curve, the percentage of entries falling in a given interval can be
estimated
by converting the interval to standard units and then finding the
corresponding
area under the normal curve. As already mentioned above, about 68% of
the observations in a large sample within one σ of the average, and
about
95% are within two σ's of
the average.
Parameters and estimators
- the normal distribution is described by two parameters, µ
and σ
- these parameters can be estimated by the mean
and standard deviation SD
- how large must the sample be to properly estimate µ?
- example: 50 samples with 100
observations
each
sampled
from a normal distribution (µ = 0, σ = 1),
the mean of means close to 0, and the SD of the means is ~1/10 of the σ
of the original normal distribution
- (Population parameters tutorial)
- how to find out if data are normally distributed?
- use StatPlus to generate random normal data
- display data in two ways (using StatPlus)
- as histogram with normal distribution overlay
- as normal probability plot (normal score is the value
you would
expect if your sample came from a normal distribution, think of an
infinite
number of samples, each consisting of 100 observations, and average the
lowest number, the second lowest and so forth)
- the SD of
is also referred to as the Standard Error (SE) of 
- by increasing the sample size, we can reduce SE
- Central limit theorem
- the central limit theorem describes the sampling distribution
of the
sample
average
,
taken from any distribution
- If each observation follows a distribution with a mean µ
and standard deviation, σ, the Central Limit theorem states that the
distribution of the sample averages approximately folows a Normal
distribution with a mean, µ, and standard deviation of σ/n1/2
(where n is teh smaple size). This theorem is true for any
distribution, as long as th mean and standard deviation of the
distribution exists and are finite.
- (Central limit theorem tutorial)
Standard Error
A repetition of an experiment will
result
in variable results. Imagine, for example, a multiple repetition of an
ozone concentration measurement. What is the error of your average
ozone
concentration? The more measurements you do (the larger n) the more precise
the
average should be.
The standard error of the average
(SE) is:

SE is therefore a measure of how precise the mean
is
an estimator of the sample averageµ . Statistical theory tells is
(for large n) that a band around the sample mean of +/- 2 SE will
include the value of µ, 95% of the time. FOr example if the
sample average is 0 and the SE is 0., we can be 95% confident that teh
value of µ lies somewhere between -0.2 and 0.2. We don't know the
exact value of µ, but we have narrowed teh range of likely
values. By increasing teh sample size n, we can make the band as narrow
as we wish.
Resources:
- Freedman, D., Pisani, R.,
Purves, R.,
and Adhikari, A. (1991) Statistics. WW Norton & Company, New York,
2nd ed. 514pp.
- Fisher, F.E. (1973)
Fundamental
Statistics
Concepts. Canfield Press, San Francisco, 371 pp.
- Berenson, M.L., Levine,
D.M.,
and Rindskopf,
D. (1988) Applied statistics - A first course. Prentice Hall, Englewood
Cliffs, NJ, 557pp.
- Lyons, L. (1991) A practical guide to data analysis for physical
science
students. Cambridge University Press, Cambridge, UK, 95p. (Barnard QC
33.L9
1991 c.1)
- Hartwig, F., and Dearing B.E. (1979) Exploratory data anlaysis.
Sage
University
Paper series on Quantitative Applications in the Social Sciences, 16,
Sage
Publcations, Beverly Hills and London, 83p. (Barnard HA 29.H257 c.1)
- Jaffe, A.J., and Spirer, H.F. (1987) Misused statitistics.
Popular
statistics,
5. Marcel Dekker, Inc., New York, 236p. (Barnard HA 29.J29 1987)
- Knoke, D., and Bohrnstedt, G.W. (1991) Basic social statistics.
F.E.
Peacock
Publishers, Inc., Itasca, Illinois, 363p. (Barnard HA 29.K735 1991 c.2)
- Levin, J., and Fox, A.J. (1994) Elementary statistics in social
research.
Harper Collins College Publishers, New York, 508p. (Barnard HA 29.L388
1994 c.1)
- Welkowitz, J, Ewen, R.B., and Cohen, J. (1991) Introductory
statistics
in the behavioral sciences. Harcourt Brace & Company, Orlando,
Florida,
391p.