Assessing Spatial Skill in Climate Field Reconstructions

Friday, May 6, 2011

***Update***: This paper has now been published in GRL.

We have a new paper (link updated to published paper) in press at Geophysical Research Letters (see also the Auxiliary Materials and the Supplemental Website) that reports on a pseudoproxy assessment of the spatial performance of four reconstruction methods used to target Common Era temperatures. Large-scale (hemispheric and global) temperature reconstruction methods generally can be divided into two categories: (1) index methods; and (2) climate ﬁeld reconstruction (CFR) methods. Index methods target mean hemispheric or global temperature time series, therefore yielding reconstructions of only these individual indices. In contrast, CFR methods attempt to reconstruct spatial patterns of temperature variability, thereby yielding spatial maps of temperature change over time. Each methodological approach has its advantages and disadvantages, but one important utility of CFRs is that they provide estimates of spatial variability and thus insights into the underlying dynamics of climatic changes that have occurred in the past. Two notable and recent examples are the Mann et al. (Nature, 2009) and Mann et al. (Science, 2009) studies, the former of which used regions of a global CFR to estimate hurricane variability during the last millennium and the latter used the same global CFR to gauge the underlying dynamical causes of the Little Ice Age and Medieval Climate Anomaly.

Despite the utility of CFRs and increasing interests in using them to characterize climate variability in the past, it is not widely appreciated that very few large-scale CFRs actually exist (there are many regional CFRs targeting multiple climatic variables, but I am only discussing efforts to reconstruct hemispheric and global temperatures in this this post). For instance, the figure below plots the 2007 IPCC Assessment Report Four (AR4) summary of Northern Hemisphere (NH) temperature reconstructions for the Common Era. Of the twelve reconstructions summarized in the figure, only two are CFRs (both derived from the same underlying proxy data, but different methods). Since the publication of the AR4, only one additional large-scale CFR has been published (Mann et al., Science, 2009). Thus, despite the great promise of these data products and the considerable effort that already has been devoted to producing them, the research is still nascent and much work remains to further refine CFR methodologies, characterize their uncertainties, and expand the multi-proxy networks used to produce them.

Modified from Figure 6.10 (left) in the 2007 IPCC AR4, showing the state-of-the-science collection of NH temperature reconstructions in 2007. The figure on the right shows the subset of reconstructions that are derived from CFRs, namely the Mann et al. (Nature, 1999) and Rutherford et al. (J. Clim., 2005) reconstructions (both are derived from the same underlying proxy data, but different methods).

It is in the spirit of CFR assessments and improvements that the work in our recent paper was pursued. Given that much of the motivation for deriving CFRs stems from the spatial information that they provide, we wanted to use pseudoproxy experiments to assess the spatial performance of the multiple methods currently used to derive large-scale temperature CFRs. It is worth noting that all previous pseudoproxy experiments have tested temperature reconstruction methodologies that are either limited to index methods, or have evaluated CFRs largely based on their ability to provide estimates of NH means. But such assessments of CFRs are insufficient for evaluating their performance spatially. As an example of this, the figure below plots results from our paper showing the NH mean computed from the same method used to produce the Mann et al. (Science, 2009) CFR (for those who care, this is the hybrid RegEM-TTLS method, split at the 20-year period). The experiment - which in many ways is a best-case scenario - used the CCSM model as the basis of the pseudoproxy experiment and a pseudoproxy network approximating the most populated nest in the Mann et al. (Nature, 1998) multi-proxy network. The figure demonstrates that the reconstructed NH mean reproduces the known model mean quite skillfully, but that such performance is not predictive of how the method performs spatially: large regions of the reconstruction suffer from very small correlation coefficients or regional biases that approach as much as a degree Celsius. It therefore is essential to go beyond NH mean assessments of CFRs in order to fully characterize their spatial performance.

From Smerdon et al. (GRL, 2011) showing pseudoproxy results using the CCSM simulation as the model testbed and pseudoproxies approximating the Mann et al. (Nature, 1998) multi-proxy distribution and an SNR of 0.5, by standard deviation. The figures characterize a CFR derived from the hybrid RegEM-TTLS method using a frequency domain split at the 20-year period, as performed in Mann et al. (Science, 2009). Plotted results are (top) reconstructed NH mean and its comparison to the target, (bottom left) grid-point correlation coefficients between the CFR and target, and (bottom right) the grid-point mean biases computed between the CFR and target. The latter two statistics were computed over the reconstruction interval of 850-1855 C.E.

Our study represents the first attempt to explicitly compare the spatial skill of four large-scale CFR methods using identical pseudoproxy experiments. We constructed our experiments using millennial simulations from two models and used pseudoproxy networks approximating the original Mann et al. (Nature, 1998) multi-proxy network and the richer Mann et al. (PNAS, 2008) network. We also tested the hybrid methodological choice, which divides the calibration process into high and low spectral domains, split at the 20-year period. Most of this will be too “inside baseball” for general readers, but the take home message is that the study comprehensively tested multiple dependencies that have clouded direct comparisons between previous pseudoproxy experiments. Given all of this as setup, one of our most important conclusions is that all of the methods perform very similarly in terms of their spatial skill. This is first of all encouraging, because each of the methods is ultimately an approximation of the same multivariate regression model and one would expect them to perform similarly. But we also show many important spatial errors in the CFRs derived from the collection of methods and there appears to be no method that universally behaves better (the small differences that do exist between methods are mostly just a tradeoff between bias or variance errors). The potential for these errors in real-world temperature reconstructions therefore needs to be better evaluated and more attempts to characterize spatial uncertainties are necessary moving forward. There nevertheless is also a caveat in our work for pseudoproxy experiments. We observe some dependencies in the spatial performance of the methods on the model simulation that was used. This suggests that careful work is needed to relate the spatiotemporal characteristics of model simulations to real-world climate fields, before the implications of pseudoproxy experiments can be fully vetted. Finally, we also derive an important conclusion for the data communities: the reconstructions tend to perform best where the pseudoproxy networks are densest, i.e. where data exists (see, for instance, the figure at the very top of this post). This suggests that one of the most important strategies for improving large-scale CFR performance is to collect more high-quality proxy data in under-sampled regions. That of course is easier said than done, but it is a fact that should not be overlooked.

Collectively, our study is intended as a first step toward more evaluations of the spatial performance of CFRs with a focus on methodological improvements, expansion of current multi-proxy networks and the robust characterization and evaluation of spatial uncertainties. We have also publicly provided all of the CFRs that were completed for this study, as well as the underlying pseudoproxy networks and model target fields. We hope that these data can serve as both a baseline and testbed for further methodological investigations focused on the problem of producing large-scale CFRs of the Common Era.

PAPER REFERENCE: Smerdon, J.E., A. Kaplan, E. Zorita, J.F. González-Rouco, and M.N. Evans (2011), Spatial Performance of Four Climate Field Reconstruction Methods Targeting the Common Era, Geophysical Research Letters, 38, L11705, doi:10.1029/2011GL047372.. [Auxiliary Materials] [Supplemental Website]

From Figure 1 in Smerdon et al. (GRL, 2011) showing grid cell correlation coefficients in a pseudoproxy experiment testing the Mann et al. (Nature, 1998) method. The experiment used the NCAR CCSM 1.4 millennial simulation as the model testbed and pseudoproxies with a signal-to-noise ratio of 0.5 by standard deviation. Pseudoproxies were sampled from locations approximating the most populated nest in the Mann et al. (Nature, 1998) multi-proxy network and are shown as black squares in the figure.