However, these AI-based weather forecasting models have not yet been rigorously tested by the meteorological community, and their utility to operational forecasters is unknown. In this presentation we propose several studies to address the above issues, grouped into two central foci:
(1) Nature of AI models: AI-based models have very different characteristics from NWP models. Thus, in addition to applying evaluation procedures developed for NWP models, we need to develop procedures that test for AI-specific weaknesses. For example, NWP models and their physics backbone guarantee certain properties - such as dynamic coupling between fields - that AI-based models are not required to uphold. Developing suitable tests is based on a fundamental understanding of the AI-based models.
(2) Forecaster Perspective: Evaluation of weather forecasting models should be performed with respect to particular applications of weather forecasts, and it is critical to have research meteorologists and operational forecasters involved in the evaluation process. Our initial evaluation of AI-based models in CIRA weather briefings revealed that these models have characteristics that make interpretation of their forecasts fundamentally different from the physics-based NWP model predictions meteorologists are familiar with. For example, the increasing “blurriness” of AI-based predictions with longer lead times is not a reflection of weaker atmospheric circulations, but rather a reflection of uncertainty. Evaluations aimed at specific meteorological phenomena and atmospheric processes will allow the community to make informed decisions in the future regarding in what environments and for which applications AI-based weather forecasting models may be safe and beneficial to use.
In summary, AI-based weather forecasts have different characteristics from familiar dynamically-based forecasts, and it is thus important to have a robust research plan to evaluate many different characteristics of the models in order to provide guidelines to operational forecasters and feedback to model developers. In this abstract we propose a number of characteristics to evaluate, present results we already obtained, and suggest a research plan for future work.

