This study aims to investigate the potential of improving cloud detection by using Random Forest (RF) ensemble classification. Random Forest (Breiman, 2001) is the ideal methodology to employ for cloud detection as it runs efficiently on large datasets, is easily parallelized, is robust to outliers and noise and is able to deal with highly correlated predictors, such as multi-spectral satellite imagery. The RF code was developed using Python 2.7 in about 4 weeks. The region of focus selected was Hawaii and includes the use of 1km visible imagery along with the 4km infrared imagery, topography and multi-spectral image products as predictors. The development of the cloud detection technique is realized in three steps. First, tuning of the RF models is completed to identify the optimal values of the number of trees and number of predictors to employ for both day and night scenes. Day and night were initially handled separately to include satellite image products as predictors specific to those times. Second, the RF models are trained using the optimal number of trees and a select number of random predictors identified during the tuning phase. Lastly, the model is used to predict clouds for an independent time period than used during tuning and compared to truth, the CMG cloud mask. Initial results show 97% accuracy during the daytime and 94% accuracy at night. Models using day and night predictors run for all times resulted in an accuracy of 95%. The total time to train, tune and test using RF was approximately one week.
The outstanding and improved performance and reduced time to produce results is testament not only to improved computer technology over the last 20 years but in the use of machine learning as a more efficient and accurate methodology of cloud detection and possible forecasts.