To address these questions, the US Navy's regional ensemble system with 32 members is employed as a base ensemble to study the impact of ensemble size and initial perturbation calibration. More ensemble members are generated with different stochastic forcings added to the horizontal and vertical mixing parameters in the Smagorinsky and Mellor-Yamada mixing parameterizations. The deterministic and probabilistic performances are studied and evaluated using ensembles with from 32 to 160 members. Various critical scores based on deterministic and probabilistic metrics are computed and compared against the 32-member base ensemble with a simple, efficient calibration. It is found that a simple, efficient initial perturbation calibration can outperform a much more computationally demanding 160-member ensemble with respect to some metrics.