NOTES - ENVIRONMENTAL DATA ANALYSIS BC 3017
notes 2004
schedule 2006
1- 9/5 overview/motivation
9/7 Units & Back of the Envelope calculation - did not do
2- 9/12 NYC water supply & Excel warmup, sea level rise
9/14 Major ions in precip, planning of experiments
3 - 9/19 How an ionchromatograph works, hand entry of data, discussion of measurement logistics
entry of data fast an dirty, use previous standard values.
9/21 IC lab - measurmens in two groups
4 - 9/26 calibration curves, data analysis
9/28 descriptive statistics, histograms, boxplots
5 - 10/3 cleaned up data sheets
10/5 began stats 2, and had students
present their stories using graphs they prepared in advance
6 - 10/10 draft of lab handed in, finished stats 2 => need to talk about systematic and bias errors earlier
10/12 stats 3, discussion of lab
reports, differnt errors, error propagation, t-disstribution, cofidence
interval
(need to move differnt kinds of errors up!)
7 - 10/17 ttests
10/19 revised lab report due - ozone
go to target
Tue 9/3/02
need to bring
pen - notes for questionaire
sign-up sheet
book, CD
introduce Dave and myself
motivate the course
data analysis skills extremely important in professional
life, facts, numbers
examples: scientist, politician,
NGO, business
society gets more and more
technical
quantitative skills are one of key requirements
for finding a job (besides writing, communication)
course aims
-
provide tools for environmental data analysis & communication
-
derivation of models (either conceptual or mathematical)
-
communication findings to others
environmental problems difficult to address scientifically because:
they are complex
and interdisciplinary
often can't
manipulate system, lack of control
data analysis
diffcult becuase of weak relationships
we need to learn how to
obtain data
organize data
handwritten sheet
tables
manipulate data
visualize data
scatter plots
property/property
timeseries
maps
interpret data
(conceptual/mathematical models)
timeseries
communicate
data and interpretation to others
scientific paper
scientific talk
handout syllabus for class - point to web site
password protection
go through elements of the course
student background questionaire data
Algebra/Calculus 5/9
differential equations 8
Climate/Solid Earth 4/2
Biosphere 2: 4
statistics Intro 1
statistics advanced 0
familairity with Software
e-mail: pine
, Eudora: , hot-mail: , OK
Netscape/Explorer
FTP/Fetch File transfer:
6 not
Wordprocessor
Excel, version: all, 1 has
Office 98
plotting software: SSTAT, JUMP,
SPSS: 1
programming: Java, C++:
1
statistical packages, SPSS:
Matlab:
technical aspects
computer accounts
access to computers/internet
which OS and vers. of Excel?
bulletin board?
advice
get a floppy disk for data files to be used in class,
bring a second one, in case first one is full
always keep 2 copies of all your files
zip disc also an option
buy the book ASAP, bring the CD Rom Thursday
so that we can install the software
Th 9/5/02
Need to bring:
-
list of participants - sign-up
-
CD
-
PM handouts
class
-
brief overview of class if there are new students
-
textbook problem? 1st homework set
-
check names, e-mail addresses, reminder, get the book
-
units, unit conversions, scientific notation
-
a few excercises
-
ice melt lab
-
NYC water supply lab
-
FTP files, practice transferring files, ftp it to your accound and send
it to yourself as e-mail and FTP it back to the laptop with a different
name; could uise the NYC water supply file
-
e-mail file to yourself
-
more back of the envelope calculations
Tu 9/10/02
Th 9/12/02
Need to bring
CD ROM
Go over certain tricks with excel etc
Bring a disk for your files as a backup!
collect homework, and NYC printout
go over NYC water example and talk about Excel tricks
installation of addins in student directory
PM lecture
Discussion of experiment, handout of sampling sheet, tape, lens
Tu 10; We 11, Th 12, Fr 13, Sa 14.
36 hours 8pm to 8am
same location on different dates?
where do you all live?
Th 9/19/02
accept late homework
collect all the counts in one file
lecture histograms/boxplots
talk about histograms and,
normalized histograms
median balances the histograms
percentiles?
description of histograms
boxplots tutorial?
work on own counts
calculate average fluxes and
SE
make histograms of all your
counts (normalized, % per binsize) perhaps rather use the STATPLUS multiple
histogram function
make another plot that shows
experiment and average and SE of fluxes
make boxplot of data
Tu 9/24/02
generate summary spreadsheet for the students to download
talk about fluxes and what they mean
discuss how to best present the data so that they tell the most
look at systematic/chance errors and discuss errors in general again
organize the data appropriately
plot bar diagrams with error bars
what does it tell you? where are real differences
highlight cells that are really different
look at weather data
Th 9/26/02
old and new homework
Excel training issue
how to write a lab report
discuss MET data and expectations
update spreadsheets
install Statplus again
goal is to look at your own data
discuss data
motivation: what do error bars really mean?
probability distributions
what does this really mean?
means are normally distributed
generate random numbers and
then calculate averages and overlay the normal distribution
Th 10/24
give midterm to Dave
talk about assistantships
lecture about amospheric chemistry (NOx and VOC dependency)
meteorological control
-global circulation
-highs/lows
-daily wind cycles
-inversions
show them correlation stat plus plot and adding trendlines
Students lab:
focus on NOx, VOC, temperature, light, ozone (both stations) and look
at relationships in time series and correlation plots.
pick one particluar day
look also at maximum values every day and look at correlations
Tu 10/29
return the exam
discuss the exam
go through basic regression analysis
-
plotting two variables against each other results
in a scatterplot, and data do not allign perfectly
-
linear regression analysis means you try to find
the line that best estimates the relationship betwen variables
-
line you find is teh fitteed regression line, equation
is called regression equation
-
simplest model, equation: Y = a + bx, dependent and
independent variable
-
intercept, slope (=vertical change/horizonatal change)
-
residual: gaps between lines and points
-
when fitting a line, we assume a linear model: Y=alpha
+ beta*x + epsilon
-
predicted value : Y' = a+bx
-
try to minimize sum of squared residuals = Sum (i=1
to n) (Yi-Y'i)^2
-
show example from book: Bcancer.xls (mortality rate,
breast cancer, temperature, 1965), is there a relationship?
-
add trendline, R2: correlation coefficient
relevant correlations:
-
temperature/ozone, perhaps 1 hour delay, very strong
correlation
-
wind from different directions at max wind speed
-
inversions at night
-
solar radiation response somewhat delayed, perhaps
not that important
-
non methane hydrocarbons
Th 11/7/02target
return lab reports, brief discussion
discussion of new lab report, due next Thursday?
main question to be answered: Why did NYC experience
high ozone levels in the summer of 2002?
focus on the pre-cursors, NOx, NMHC (as best
estimate of VOC), temperature, solar radiation, wind
Introduction
ozone key air quality parameter
health effects
factors contributing to ozone
formation
studying a specifc period
in the summer of 02 in NYC
Methods
entered data from sheets provided
by DEC (reference)
performed data analysis
Results
present the data in such a
way that you see something
timeseries, scatter plots,
max concentration plots
regression analysis
Discussion
which parameter is limiting?
some conclusions