Currently available datasets, including IBTrACS, TC-PRIMED, and the SHIPS developmental dataset, provide data that has been selected by domain experts to be useful in TC applications. By selecting the tracks, input variables, and data processing according to our best understanding of the phenomena affecting TCs, they provide researchers a great tool for developing models that exploit our current understanding. However, the static nature of this selection of data limits the ease with which studies can build on the dataset - including the exploration of new predictive variables, track exploration, or the effective leverage of spatiotemporal information. Additionally, careful consideration of the biases in the datasets is needed, as over-reliance on any one source of data can lead one to conclusions based on artifacts stemming from these biases.
To address these issues, we are developing TCBench, the first platform and added-value benchmark dataset spanning 1980-Present for the data-driven prediction of tropical cyclones. Our user-friendly platform includes open preprocessing tools, evaluation protocols, visualization tools, and baseline prediction models to benefit the atmospheric science and AI communities.
In this presentation, we will showcase the importance of providing a flexible dataset and platform by focusing on recent breakthroughs in medium-range data-driven models, namely PanguWeather, Graphcast, and FourCastNet v2. These AI-driven models have proved to compare well with the outputs of deterministic models, and we evaluate their use for both TC track predictions and intensity predictions. We provide the outputs of these models within a framework that allows both members of the atmospheric and data science communities to effectively compare the use of these models with traditional, physics-based TC forecasts. Furthermore, we demonstrate how to post-process these outputs for TC track and intensity prediction using convolutional architectures.
Still, we note that it is highly unlikely that advancements in the use of ML models for weather forecasting will be stalled at current implementations. This is especially relevant when considering that models are currently generally trained on ERA5, which is known to have a negative bias with relation to TC intensities. By creating the framework to add outputs from new models (both data-driven and otherwise) we hope to ensure that the community is able to compare the newest models with both the currently leading physics-based numerical weather prediction (NWP) and AI-driven models.
It is also important to compare outputs of models developed in research, as comparisons to well established baselines allow the community to measure the advantages and disadvantages of different models. The proper implementation of these remains a challenge that must often be addressed by researchers themselves. Instead, we aim to provide baselines models that can be run locally by researchers via TCBench, thereby promoting objective evaluations of model performance. Here, we showcase baselines models for the sequential prediction of TC intensity changes, including 1D convolutional neural networks, long-short term memory neural networks, and transformer-based architectures.
In conclusion, by emphasizing flexibility in data experimentation, we provide a data foundation that can easily be enhanced with advancements in TC research. The ultimate goals of this dataset and platform are to allow the research community to improve tropical cyclone forecasting accuracy, to increase the understanding of the physical processes that govern TC behavior, obtain more reliable TC future projections, and ultimately, to contribute to mitigation of TC impacts on local communities and society at large.

