Using Generative and Supervised Neural Networks for Thermal Image Analysis in an Urban Environment

Sharma, Shaunak; Sharma, Shaunak

While thermal cameras are essential for quantifying temperature variations across different applications, their high cost and coarse resolutions, often lower than modern smartphone cameras, present limitations. In an urban heat island study that attempts to correlate land surface temperature (LST) and air temperature, thermal imagery is constricted to bounded street-level imagery due to physical challenges and regulations prohibiting drone-based platforms. Additionally, in order to differentiate surface types for further analysis, thermal images’ visual counterparts must be manually segmented, which is labor intensive. This study proposes two neural networks: a Pix2Pix conditional Generative Adversarial Network (cGAN) to generate highly granular and realistic synthetic thermal images and a Region-based Convolutional Neural Network (Mask R-CNN) to segment corresponding visual images based on feature types. Unlike previous urban heat studies, this work pioneers deep learning to generate realistic synthetic thermal imagery, with the segmented visual images providing novel training data. Furthermore, the Pix2Pix cGAN offers a set of unique advantages; unlike a regular generative adversarial network (GAN), which learns to generate new data from random noise, a Pix2Pix cGAN can produce higher quality and more controlled outputs by conditioning its generation on specific input image examples. In label-studio, a Python integration, we initially hand-segmented 30 visual images for training and exported a COCO-format JSON file for training the Mask R-CNN model. Our Mask R-CNN comprises an 80-10-10 split for trained, validated, and tested images, respectively, and segmentation polygons are the most precise with a moderately high predictor value. Using a cloud-hosted V100 GPU, each training epoch—consisting of 10,000 steps each—took an average of 35 seconds to complete, for a new synthetic image was formulated; synthetic thermal images, produced by the Pix2Pix cGAN, strongly resembled their infrared training counterparts’ color scheme and object placements for 80-90% of the images in the testing dataset. While discriminator and generator loss functions asymptotically reached zero and a maximum, respectively, better after 100 epochs, 30-40 epochs and larger learning rates of 4e-3 and 1e-4—with an Adam optimizer—produced higher-quality images. Additionally, the generator’s L1 loss function indicated a consensus: regardless of the chosen learning rate, values decreased below 0.40 after 20-25 epochs, indicating accurate pixel-to-pixel comparisons between generated and ground truth images. Future work will quantitatively evaluate synthetic infrared images by correlating color palettes with temperature values and determine average LST for various surface types using Mask R-CNN segmentation overlays. Leveraging the performance of the trained neural networks, transfer learning will pioneer new capabilities in localized weather prediction by adapting these advanced models to enhance existing k-nearest neighbor architectures, unlocking advanced air temperature modeling, and enabling high-resolution predictive capabilities.

S7 Using Generative and Supervised Neural Networks for Thermal Image Analysis in an Urban Environment