6B.5 Quality Control and Extreme Values of Surface Temperature Measurements from the South Alabama Mesonet

Tuesday, 12 January 2016: 2:30 PM
Room 355 ( New Orleans Ernest N. Morial Convention Center)
Sytske Kimball, University of South Alabama, Mobile, AL

Quality control and extreme values of surface temperature measurements from the South Alabama Mesonet

Temperature data from 1 January 2009 through 31 August 2015 collected from 11 University of South Alabama Mesonet (http://chiliweb.southalabama.edu/) stations were quality controlled and examined for extreme values. Each Mesonet station has 4 temperature probes, a Campbell Scientific Inc. (CSI) 107-LC thermister at 1.5 and 9.5 m above ground level (AGL) and a CSI HMP45C Platinum Resistance Temperature detector (PRT) at 2 and 10 m AGL. Data are collected at 1-minute intervals using a Campbell Scientific CR 3000 datalogger. An automated quality control (QC) algorithm was developed to flag bad data using range tests and like-sensor tests. The statistical program JMP from SAS Institute Inc. was used to set appropriate thresholds for the range test and the four like-sensor tests. To determine upper and lower limits for the range test, the raw temperature data was sorted by ascending values. In most cases, a large jump existed between real and faulty values. This was done for each of the four temperature sensors at all 11 stations. Results for the station at Grand Bay are shown below:


Minimum T

Maximum T

1.5 m


6:33 CST 11 Jan 2010


14:13 CST 4 June 2011

2 m


6:34 CST 11 Jan 2010


14:13 4 June 2011

9.5 m


6:32 CST 11 Jan 2010


14:57 4 June 2011

10 m


6:38 CST 7 Jan 2014


14:57 4 June 2011

Table 1: Lower and upper limits for each temperature (C) probe at Grand Bay, Alabama.

Different values were found for different stations, in some cases this could be attributed to station siting. For example, the weather station at Mt. Vernon is located at relatively low elevation 16.04 m above sea level in a river valley, while the station at Pascagoula is located about 2 km from the Gulf of Mexico.

If a sensor passes its range test, it could still be bad for the following reasons:

-          Small spikes could occur

-          Flat-lining at a valid temperature could occur

-          A temperature value that might be valid in the winter, could be bad in the summer season. Range test limits were not stratified by season.

Like-sensor tests are used to flag the above problems. To determine thresholds for the like-sensor tests, temperature differences for each pair of sensors were calculated. With four sensors there are six possible pairs of sensors, but only four were used. Each sensor gets compared to its nearest neighbor (i.e. 0.5 m height difference) and its same model counterpart on the other crossbar (i.e. 8 m height difference). Because the distributions of the differences were not Normal, instead of setting the limits at 3 times the standard deviation away from the mean ( 3σ), the end of the whisker on the box plots were used as the lower and upper limit, because any points beyond are defined as outliers.

The ends of the whiskers are defined as follows:

Lower whisker limit = 25th percentile 1.5 X IQR

Upper whisker limit = 75th percentile + 1.5 X IQR


25th quartile

75th quartile


1.5 X IQR

lower whisker

upper whisker

1.5m 2m







10m -2m







1.5m - 9.5 m







10m - 9.5m







Table 2: Temperature (C) differences for four pairs of temperature probes at Grand Bay, Alabama

As expected (Table 2), a larger Inter-Quartile Range (IQR) is found for pairs of sensors spaced 8 m apart than for sensor pairs that are 0.5 m apart. Since the World Meteorological Organization (WMO) standard mounting height for temperature sensors is 1.25 to 2 m AGL and the HMP45C is considered the better quality probe, the 2 m temperature sensor is taken as the primary sensor at each station. If data from this sensor passes all QC tests, it is used, otherwise its backup sensor at 1.5 m is used. But the latter also has to pass all QC tests. When the like-sensor test between the 2 and 1.5 m probes fails, either sensor could be bad. The remaining three like-sensor tests are used to identify which of the two probes (or both) is bad.

In the presentation, the percentages of flagged data and type I and II errors will be presented. A type I error (or false positive) is detecting a bad temperature reading when in fact the reading is good, while a type II error (or false negative) is failing to detect a bad temperature reading. Means, medians, and extreme values of temperature will be compared to local climatological values from regional ASOS and COOP stations as well as near-by stations operated by the Dauphin Island Sea Lab and Auburn University.


- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner