Three-dimensional vertical velocity fields simulated by the Goddard Cumulus Ensemble (GCE) model with horizontal resolution of 1km was used to represent atmospheric convection. Individual 3D convective cells are segmented and labeled using two types of unsupervised machine learning algorithms for simulated scenes of mesoscale convective systems during the Monsoon season in Darwin, Australia. In addition, idealized 3D cloud-resolving model simulations of single convective cells are used to augment the database. The unsupervised machine learning algorithms include k-means clustering and the open source "pyclesperanto_prototype"� using Voronoi-Otsu Labeling (essentially a 3D watershed algorithm). Both unsupervised machine learning algorithms require human intervention when dealing with complicated convective cell merging and splitting. Each of the three methods have a sample size of 2000 to 3000. In this presentation, we compare and validate characteristics of the segmented 3D convective cells, their mean behavior, and their associated vertical mass flow using the three methods. The validated convective cell database is intended as a training dataset for both general purpose 3D image segmentation algorithm testing, and convective cell identification and tracking using a 3D Mask R-CNN algorithm.

