Although there are numerous reasons for performing a verification analysis, there are usually two general questions that are of interest: are the forecasts good, and can we be confident that the estimate of forecast quality is not misleading? When calculating a verification score, it is not usually obvious how the score can answer either of these questions. Some procedures for attempting to answer the questions are reviewed, with particular focus on p-values and confidence intervals. P-values are shown to be rather unhelpful in answering either question, especially when applied to probabilistic verification scores, and confidence intervals are to be preferred. However, confidence intervals cannot reveal biases in the value of a score that arises from an inadequate experimental design for testing on truly out-of-sample observations. Some specific problems with cross validation are highlighted. Finally, in the interests of increasing the insight into forecast strengths and weaknesses and in pointing towards methods for improving forecast quality, a plea is made for a more discriminating selection of verification procedures than has been adopted to date. Copyright (c) 2008 Royal Meteorological Society.
296PZTimes Cited:2Cited References Count:63