J52.4 Improving the Accuracy of Cloud Detection Using Machine Learning

Thursday, 11 January 2018: 11:15 AM
Room 15 (ACC) (Austin, Texas)
Mary Ellen Craddock, Northrop Grumman Corporation, McLean, VA; and R. Alliss and M. Mason

Cloud detection from geostationary satellite imagery has long been accomplished through multi-spectral channel differencing in comparison to the Earth’s surface. The distinction of clear/cloud is then determined by comparing these differences to empirical thresholds. Using this methodology, the probability of detecting clouds exceeds 90% but performance varies seasonally, regionally and temporally (Jedlovec, 2009; Saunders and Kiebel 1988; Merchant et al. 2005; Jedlovec et al., 2008; Reuter et al., 2009). The Cloud Mask Generator (CMG) database developed under this effort, consists of 20 years of 4 km, 15 minute clear/cloud images based on GOES data from 1997-present over CONUS and Hawaii. The algorithms to determine cloudy pixels in the imagery are based on well-known multi-spectral techniques and defined thresholds for each area of interest. These thresholds were produced by manually studying thousands of images over 12 months and thousands of man-hours to determine the success and failure of the algorithms to fine tune the thresholds.

This study aims to investigate the potential of improving cloud detection by using Random Forest (RF) ensemble classification. Random Forest (Breiman, 2001) is the ideal methodology to employ for cloud detection as it runs efficiently on large datasets, is easily parallelized, is robust to outliers and noise and is able to deal with highly correlated predictors, such as multi-spectral satellite imagery. The RF code was developed using Python 2.7 in about 4 weeks. The region of focus selected was Hawaii and includes the use of 1km visible imagery along with the 4km infrared imagery, topography and multi-spectral image products as predictors. The development of the cloud detection technique is realized in three steps. First, tuning of the RF models is completed to identify the optimal values of the number of trees and number of predictors to employ for both day and night scenes. Day and night were initially handled separately to include satellite image products as predictors specific to those times. Second, the RF models are trained using the optimal number of trees and a select number of random predictors identified during the tuning phase. Lastly, the model is used to predict clouds for an independent time period than used during tuning and compared to truth, the CMG cloud mask. Initial results show 97% accuracy during the daytime and 94% accuracy at night. Models using day and night predictors run for all times resulted in an accuracy of 95%. The total time to train, tune and test using RF was approximately one week.

The outstanding and improved performance and reduced time to produce results is testament not only to improved computer technology over the last 20 years but in the use of machine learning as a more efficient and accurate methodology of cloud detection and possible forecasts.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner