At present, we draw upon 27 publicly available historical data observation networks within the Western Electricity Coordinating Council (WECC) domain, consisting of a total of 16,079 weather stations that span between 1980 and 2022. Stations are constrained to observations that provide any measurement of eight primary meteorological variables of interest in electricity demand planning, asset placement, resiliency, and estimating renewable energy generation capacity needs, namely: air temperature at two meters, dew point temperature, air pressure, precipitation, relative humidity, wind speed and direction, and solar radiation. Network sizes range between less than 10 stations (e.g., VCAPCD, SHASAVAL, CDEC) and 1,000+ stations (e.g., CWOP, RAWS, HADS). Raw observational data are stored in their native data format in a single location on the cloud.
There are multiple challenges facing a single unified weather observation data platform consisting of individual networks with their own internal quality assurance/quality control procedures, or lack thereof, instrument reporting heights, and metadata breadth. We first intentionally standardize our input observation networks via processing the raw native data into a single netCDF file per station containing all observations over the time period, with CF-compliant naming conventions and standard units, a consistent encoding of missing values, and non-relevant variables removed. Because not all input networks perform their own internal quality assurance/quality control checks, standardizing the raw observational input in this manner allows for a consistent approach between networks. We intentionally retain any quality assurance/quality control information per variable and observation if provided for any station regardless of network to ensure that suspect observations are flagged appropriately through our quality assurance/quality control procedure, and as a secondary check on the utility of our methods. Quality assurance/quality control procedures mirror the HadISD protocol and iteratively refine these methods for a western North America focused approach towards automatic detection and ultimately produce a robust dataset of observations with a minimal false positive flag rate. Specific focus is paid to the consecutive values test and spike checks, which typically overflag data during events specific to western North America (e.g., Santa Ana/Diablo winds, marine layer intrusions). Furthermore, given recent extreme weather in California, we also examine how the updated quality assurance/quality control process handles case studies of extreme weather events in California, including the September 2022 California heatwave, the June 2021 North American heatwave which reportedly surpassed HadISD extreme temperature protocols by 5°C, and the 2022-2023 California atmospheric river event and consequent flooding. The final step in our approach is to derive any variables that are missing, deduplicate any stations, homogenize time series for specific datasets (ASOS/AWOS) and standardize all observations to hourly resolution.
Future iterations of this work will include further inclusion of privately-owned weather station networks operated by IOUs, development of additional meteorological parameters of interest to the electricity sector (e.g., wildfire and drought indices), and producing a geospatial gridded data product, which will support future downscaling efforts and deliver weather information at the resolution and time steps (hourly) that is necessary for energy planning. Significant efforts have been undertaken to build credibility with the users of weather information in the energy sector, and are further needed to encourage widespread adoption in the face of climate change. Upon final quality assurance/quality control checks and standardization, this novel transformative data platform will be accessible and available to extract and download by stakeholders through a Jupyter Notebook and open-source code on the Cal-Adapt: Analytics Engine, an open-source and co-produced big data computational platform in support of California’s Fifth Climate Assessment, as well as on AWS through the Open Data Registry. This data will support current efforts in electricity demand planning, asset placement, resiliency, and estimating renewable energy generation capacity needs, and serve as a critical localized dataset for future downscaling efforts of global climate models in the variables and resolution (hourly, sub 3km) needed in energy sector applications.

