Addressing an Energy Sector Need for Comprehensive Quality Controlled Historical Weather Data Platform in Western North America

Ford, Victoria; Ford, Victoria

Despite years of scientific development and millions of dollars in federal investment, energy sector planners struggle to access weather observations that are appropriate for use in energy system modeling and planning applications. The reasons are myriad, and range from human, to legislative, to technical challenges, including: incomplete or arcane metadata and documentation, restrictive licenses which conflict with open data / open government requirements, frequent missing values, missing climate variables of interest (often solar), and scientific data that is provided in formats that are unusable by the energy sector. For example, it can be difficult for weather and climate data (usually gridded) to meet the need of ground, point-based observations inherent in energy system application calibrations. To address this need, our team engaged with energy sector planners, analysts, state agency decision-makers, and Investor Owned Utilities (IOUs) across the state of California to identify existing barriers to their use of historical weather data. Here, we present a comprehensive data assimilation platform of hourly historical data that has undergone rigorous quality control across western North America, focusing on weather station level granularity to support energy sector resilience and safety and will ultimately provide natural gas and electricity sector stakeholders with the variables and data resolution needed to support a clean energy transition. This platform directly responds to multiple stakeholder needs in understanding weather and climate information in near real-time including the severity, duration, frequency, and rate of change over time of extreme weather events, as well as supporting downscaling projections, which have been limited by poor characterization of surface winds and surface radiation, especially in complex terrain throughout western North America.

At present, we draw upon 27 publicly available historical data observation networks within the Western Electricity Coordinating Council (WECC) domain, consisting of a total of 16,079 weather stations that span between 1980 and 2022. Stations are constrained to observations that provide any measurement of eight primary meteorological variables of interest in electricity demand planning, asset placement, resiliency, and estimating renewable energy generation capacity needs, namely: air temperature at two meters, dew point temperature, air pressure, precipitation, relative humidity, wind speed and direction, and solar radiation. Network sizes range between less than 10 stations (e.g., VCAPCD, SHASAVAL, CDEC) and 1,000+ stations (e.g., CWOP, RAWS, HADS). Raw observational data are stored in their native data format in a single location on the cloud.

There are multiple challenges facing a single unified weather observation data platform consisting of individual networks with their own internal quality assurance/quality control procedures, or lack thereof, instrument reporting heights, and metadata breadth. We first intentionally standardize our input observation networks via processing the raw native data into a single netCDF file per station containing all observations over the time period, with CF-compliant naming conventions and standard units, a consistent encoding of missing values, and non-relevant variables removed. Because not all input networks perform their own internal quality assurance/quality control checks, standardizing the raw observational input in this manner allows for a consistent approach between networks. We intentionally retain any quality assurance/quality control information per variable and observation if provided for any station regardless of network to ensure that suspect observations are flagged appropriately through our quality assurance/quality control procedure, and as a secondary check on the utility of our methods. Quality assurance/quality control procedures mirror the HadISD protocol and iteratively refine these methods for a western North America focused approach towards automatic detection and ultimately produce a robust dataset of observations with a minimal false positive flag rate. Specific focus is paid to the consecutive values test and spike checks, which typically overflag data during events specific to western North America (e.g., Santa Ana/Diablo winds, marine layer intrusions). Furthermore, given recent extreme weather in California, we also examine how the updated quality assurance/quality control process handles case studies of extreme weather events in California, including the September 2022 California heatwave, the June 2021 North American heatwave which reportedly surpassed HadISD extreme temperature protocols by 5°C, and the 2022-2023 California atmospheric river event and consequent flooding. The final step in our approach is to derive any variables that are missing, deduplicate any stations, homogenize time series for specific datasets (ASOS/AWOS) and standardize all observations to hourly resolution.

Future iterations of this work will include further inclusion of privately-owned weather station networks operated by IOUs, development of additional meteorological parameters of interest to the electricity sector (e.g., wildfire and drought indices), and producing a geospatial gridded data product, which will support future downscaling efforts and deliver weather information at the resolution and time steps (hourly) that is necessary for energy planning. Significant efforts have been undertaken to build credibility with the users of weather information in the energy sector, and are further needed to encourage widespread adoption in the face of climate change. Upon final quality assurance/quality control checks and standardization, this novel transformative data platform will be accessible and available to extract and download by stakeholders through a Jupyter Notebook and open-source code on the Cal-Adapt: Analytics Engine, an open-source and co-produced big data computational platform in support of California’s Fifth Climate Assessment, as well as on AWS through the Open Data Registry. This data will support current efforts in electricity demand planning, asset placement, resiliency, and estimating renewable energy generation capacity needs, and serve as a critical localized dataset for future downscaling efforts of global climate models in the variables and resolution (hourly, sub 3km) needed in energy sector applications.

203 Addressing an Energy Sector Need for Comprehensive Quality Controlled Historical Weather Data Platform in Western North America