The archive is accessed at http://www.wxqa.com/lum_search.htm and a data report prepared in February 2015 is available at http://wxqa.com/solar_data_archive_fin.pdf . A data parsing routine has been prepared and is published at www.github.com/lohancock/solar-data-parser. A parsed archive is available as an Amazon Redshift database that people are invited to clone (contact either author).
This presentation sets out the dimensions of the archives of raw and parsed data and some tools for quality assessment.
Archived data comes from all sources willing to contribute. Some are known, such as national meteorological agencies. Still, much is essentially crowd-sourced. As with any crowd-sourced dataset, quality control is important. In principle, station locations and data collection times may not be what is reported. In practice this happens sometimes (fortunately rarely). One means of quality control already exists: the CWOP program, parent to the solar radiation data archive, undertakes daily quality-checking of all subscribing weather stations by daily comparison of weather models to station observations. This comparison enables assessment of whether stations are where they say they are, send data collected when they say it was collected, and calibrate their met instrumentation. But not all stations contributing to the archive subscribe to the daily quality-check, and solar radiation is not included in the comparison. To close this gap, we have associated with every observation of solar radiation a modeled value for comparison.
In sum, the parsing routine does this:
• From each report it extracts the transmitted values: station name, latitude, longitude, date, time of day, solar radiation measurement, other meteorological measurements, and a description of hardware. In so doing it rejects observations or coordinates or time stamps that are out of format, and adds a flag if the provided date is outside the time limits of the daily archiving.
• To each observation, it adds the solar zenith angle, azimuth angle, and the equation of time corresponding to the latitude, longitude and time (for this it uses the insol routine of Javier Corripio). These facilitate quality checks, e.g., enabling identification of solar radiation reported during local night.
• To each observation, it adds a modeled value for solar radiation (direct+diffuse), again using Corripio's insol routine. The model draws on the respective report's values for latitude, longitude, date, time of day, temperature and relative humidity (where any of that is unavailable, the calculation is not made), elevation obtained by a DEM lookup at gpsvisualizer.com, visibility uniformly taken to be 90 km, and albedo and ozone taken at average values.
Comparison of observed and modeled L over a station time series enables a visual quality check of the station's reports of latitude, longitude and clock.
• To automate that quality check, the routine provides also for each observation the comparison
t = - ln (Lobserved/Lmodeled)
To show how the data is used, we present illustrations of a selection of trends and cycles that were interesting to us, some old, some new, accompanied by a compendium prepared in the spirit of reproducible research to show how key figures were made.
Whether one is interested in dust rings or more conventional issues such as solar energy planning and weather forecasting, the database is publicly available and the authors will be most pleased if it is found useful.
Supplementary URL: http://wxqa.com/lum_search.htm and https://github.com/lohancock/solar-data-parser/blob/master/ams2016.Rpres