Environmental Data Analysis EESC BC 3017

Histograms/Boxplots

Definitions

The histogram

The histogram is graphical presentation of a list of data. To make the general shape of a histogram independent of how the bin sizes are selected, the Y-axis can be normalized to reflect ‘% per bin size’. With this ‘density scale’ on the vertical axis, the areas of the blocks come out in %, because the units on the horizontal axis cancel. The area under the histogram over an interval equals the percentage of cases in that interval. The total area under the histogram is 100%.

See example with distribution of family income in the United States (fig).

Measures of the center: mean, median, and mode

A typical list of a variable can be summarized by its average and standard deviation (s or SD)

Mean (Average) of a list = sum of entries/number of entries.

The average locates the center of a histogram, in the sense that the histogram balances when supported at the average (fig).

Half the area under a histogram lies to the left of the median, and half to the right. The median is another way to locate the center of a histogram (fig).

A 'compromise' between median and mean is the trimmed mean, which removes a certain percentage of numbers at upper and lower end of spectrum (works only in statplus).

The mode is  the most frequently occurring, or repetitive, value in an array or range of data

There are other measures of the center, such as geometric mean and  harmonic mean, which we are not going to talk about in this class.

Distribution statistics

Relevant EXCEL functions:

Resources:

Freedman, D., Pisani, R., Purves, R., and Adhikari, A. (1991) Statistics. WW Norton & Company, New York, 2nd ed. 514pp.

Fisher, F.E. (1973) Fundamental Statistics Concepts. Canfield Press, San Francisco, 371 pp.

Berenson, M.L., Levine, D.M., and Rindskopf, D. (1988) Applied statistics - A first course. Prentice Hall, Englewood Cliffs, NJ, 557pp.