Monday, 29 January 2024: 4:30 PM
347/348 (The Baltimore Convention Center)
In power energy operations, ensuring the continuous supply of energy is paramount. The aerial electric power grid is highly vulnerable to severe storms, usually accompanied by intense lightning activity. Cloud-to-ground lightning (CG) is a rife source of power outages, necessitating accurate and timely forecasting to mitigate potential hazards. We present the development of a Machine Learning (ML) model for CG forecasting in the next 10 min for Parana, south of Brazil. The primary objective is safeguarding lives and power infrastructure following Goal 7 of the Sustainable Development Goals of the United Nations. In Brazil, the National Electric Energy Agency (ANEEL) surveys power distribution companies to ensure minimum levels of quality. These distributors are subject to sanctions and may even lose their distribution rights if they do not meet the continuity standards set by ANEEL measured by two indicators: the equivalent timespan of the interruption per consumer unit and the number of times a consumer unit has been without power for the period considered. Power outages are disregarded in determining these indicators if caused by events beyond the distributors' control, like severe weather events. Operational records of power outages declared by automatic sensors in Parana from 2017-2021 indicate that at least one power outage occurred in the region daily. Disregarding the momentary failures, i.e., power outages lasting less than three minutes, 98.3% of days had at least one power outage event, 28.6% of them attributed directly to severe weather and lightning strikes by the power company. We found a correlation of 0.98 between lightning incidence and power outages from 2017-2021, indicating that more events must be lightning-related. ML is useful to atmospheric studies for its capacity to derive information and identify patterns in vast datasets, enabling the development of analysis or prediction models. The cornerstone of ML is the dataset used -- for the machine to learn about a problem, data pertinent to the problem must be supplied to it. Thus, the research commences with the data collection, amassing historical data from a local Lightning Detection and Location Network (LDLN). In our study, we considered CG data in Parana from 2018-2022: date, time, latitude and longitude, and current peak. This data was subjected to preprocessing, consisting of cleaning, normalization, and feature engineering, ensuring the development of a suitable dataset for training and validation of the ML model. Primarily, we excluded unreliable data from the dataset, namely CG detected by less than three sensors of the LDLN. For our nowcasting task, the region of Parana was divided into a grid of 0.1° by 0.1°, roughly equivalent to an area of 100km², constituting a grid of 6300 pixels. The features for the model were the normalized density values of CG of the past 30 minutes in the 24 pixels surrounding the target pixel. The features and targets comprise the labeled dataset of CG events in Parana from 2018-2022. 2018-2021 were used for the training and validation datasets, and 2022 constitute the test dataset. Given that our proposal defines a regression task, three ML regressors were selected for experimentation: Linear Regression, Gradient Boosting, and Random Forests. These algorithms were chosen based on their satisfactory performance for meteorological forecasts in the reviewed literature. These algorithms were trained and then evaluated by the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R2). Gradient Boosting presented the preeminent results on the validation dataset, with an MAE of 0.00842, and so was selected for application on the test dataset. The forecasts made by the Gradient Boosting model on the test dataset were grouped into five classes: Class 0 - 0 CG; Class 1 - 1 to 10 CG; Class 2 - 11 to 20 CG; Class 3 - 21 to 40 CG; and Class 4 - 41 to 150 CG. This was done because the impact of one or two lightning strikes in a 100km² area is effectively the same, and using classes allows for the verification of metrics such as accuracy, precision, and recall. The ML model could predict the occurrence of CG and declare the number of events in a pixel with an accuracy greater than 97% for all classes, both in cases of isolated events and in severe thunderstorms. Nonetheless, its precision and recall displayed unsatisfactory values for the latter classes, demanding a re-evaluation of the model. The capacity to predict few and isolated CGs is a key feature of the model, as any single CG can be disruptive. However, the model predicted certain pixels of Class 0 as Class 1, indicating an overestimation of the number of pixels with CG, predicting a more homogeneous shape of CG occurrences than in the observed data. The model also underestimated the density values in cases of greater CG activity, declaring pixels of Class 4 as Class 3. We acknowledge that this flaw originates from the insufficient number of examples of Class 4 in the training dataset. This study presents the development of an ML model to forecast CG for the next 10 min, offering power energy companies a valuable tool for risk management and operational planning. Future steps include balancing the training dataset and increasing the lead time from 10 min to up to one hour. Additionally, a database of past lightning occurrences related to power outages will be constructed to assist power companies in documenting power outages associated with CG. Improving observation and forecasting capabilities of these high-impact weather events helps in energy companies' decision-making process and active measures, as well as mitigating and anticipating damage.

