Currently, an operational neural network (NN) model is used to predict water temperatures and potential cold-stunning events. The predictions provide a forewarning to help better prepare community stakeholders for precautionary actions to mitigate the impact of cold-stunning events in southern Texas.
Recent discussions with stakeholders revealed the need for uncertainty estimates of these predictions. Prior research has analyzed different methods to improve model performance (comparing different model structures; Duff et al., 2023) and to provide uncertainty information on water temperature predictions (utilizing ensemble methods; White et al., 2023). Our research goal is to analyze the variability in model performance between two different cross-validation methods, k-fold and stratified k-fold cross-validation.
For this study, Laguna Madre water and air temperature measurements from 2012-2022 (excluding 2021) are used as inputs in the NN model. The dataset was stratified into three groups (i.e, training, validation, and testing). Folds for both cross-validation methods were established using each year as natural dividers (10 years = 10 folds). For the k-fold method, training (8 years), validation (1 year), and testing (1 year) datasets were sequentially rotated 10 times, equating to 10 distinct folds. For the stratified k-fold method, only years that contained cold-stunning events were used within the validation and testing sets to capture the performance of the model when lower water temperatures are observed. An additional 10 folds were created using this approach.
The objectives of this research will be beneficial for the development of future models. Analyzing model variability between both cross-validation methods will help to gain better insights on how model performance can be affected when fed certain training data combinations and what combination of training, testing, and validation years is optimal for predicting low water temperatures across different lead times.
In order to evaluate how utilizing different testing, validation, and training datasets can affect model performance, various performance metrics are used (e.g., mean error, mean absolute error (MAE), mean 10% maximum error, ME below 12°C, and MAE below 12°C (MAE<12)). Model performance will also be evaluated over 12, 48, and 96-hour lead times. Determining the most effective cross-validation method for the model will be an important component when creating needed uncertainty quantifications of water temperature predictions in the Laguna Madre for improved mitigation strategies during rare cold events in southern Texas.

