Lab 4 Regression analysis
-
Multivariable matrix
-
Use the multi-variable chart function of StatPlus to create a matrix of
scatterplots with temperature, salinity, density (sigma_t), AOU, and CO2.
(This is not mandatory for the lab results so if your computer can not
handle the extra memory it takes up this step can be skipped.)
-
Use the multivariate function of StatPlus to create a matrix of correlation
coefficients with temperature, salinity, density (sigma_t), AOU, and CO2.
-
Which variables are significantly correlated concentrations of CO2?
-
What does this imply are the dominating mechanisms controlling changes
in CO2?
-
Are these the only factors that control the concentration of CO2?
-
Calculate the regression equations:
-
Use variance
and covariance
to calculate Least
squares method for the variable that is best correlated with CO2.
(Remember that VAR and COVAR function in excel will calculate the variance
and covariance; however, COVAR must be multiplied by n and divided by (n-1))
-
Use REGRESSION function in excel for the variable that is best correlated
with CO2.
-
Use VAR and COVAR function in excel to calculate Reduced
major axis for the variable that is best correlated with CO2.
-
Plot each of the above regression equations on a plot.
-
For each of the above equations calculate the sum
of squared residuals.
-
Which method shows the smallest sum of least squares residuals and thus
is the sum of least squares method?
-
Which regression equation is most appropriate if we assume that the measurement
errors for both CO2 and the variable best correlated with CO2
and why?
-
Multivariable predictions
-
Plot the residuals from the predicted CO2 and the measured CO2
verses the variable that is also well correlated (the second largest R2
not
including oxygen) with CO2.
-
Is there a trend? If so, use the linear trend to create an equation which
will predict CO2 based two other variables and calculate a sum
of least squares residual and subtract that from the sum of squares
of CO2 (var(CO2)*(n-1))? What fraction of variance
in TCO2 does this new equation represent.
-
Assuming no error in the measurement of the two variables that best correlate
with CO2 use the LINEST or REGRESSION function to calculate
a multi variable least squares fit to the TCO2 data. As above, calculate
the sum of squared residuals and calculate how much of the variance in
CO2 can be explained using least squared fit with multiple (two)
parameters.
-
Perform a multi variate regression to predict the CO2 with all
the variables ( temperature, salinity, density (sigma_t), Dt and AOU) using
the regression function. Does this improve the sum of squared residuals?
-
Interpretation of the data:
-
Using the results above explain as precisely as possible what they imply
about the variability of CO2 as observed by flip. You should
format this in the following order:
-
Results: Explain exactly what your statistical analysis shows (by explaining
graphs and calculations produced above). It will also be helpful to show
a time series of CO2 and the two parameters that best correlate
with CO2 to set the stage.
-
Summary: Explain what the two parameters that best correlate with CO2
imply about the mechanisms that result in variability of CO2.
Explain which process seems to dominate and how this is different from
the basic processes that I have explained in the introduction
to this lab. In particular, you should note the photosynthetic quotient
that we expect (moles O2 produced/ moles of CO2 used-
see experiment section of the introduction
to this lab).