To test each formulation, we gathered more than two million archived aircraft eddy dissipation rate (EDR) observations for 100 days during the 2017-2018 winter. We compared these observations to one hour EDR forecasts computed on the Rapid Refresh (RAP) numerical forecast model. Six of the forecasts are based on various gravity wave indentifying equations, the Lighthill-Ford equations, divergence tendency, frontogenesis, the Plougonven-Zhang equations, buoyancy advection, and acceleration divergence. Additionally, we tested the GTG Version 3. Overall, as measured by the Heidke Skill Statistic, all the gravity wave formulations significantly outperformed GTG3 with acceleration divergence leading the way at about ten times better than GTG3. This was because more than 99% of the EDR observations were zero and GTG3 forecasts more than half the observations greater than 0.1 EDR units. However, it is far more important to forecast significant turbulence than smooth. To that end, we gathered 41 significant, mostly injurious, CAT reports within the RAP domain from the Aviation Herald archives beginning 2010. Here, the Lighthill-Ford equations captured 82% with EDR > 0.4 units while all the others were less than half including GTG3 with only 17% with the same threshold. Combining the Probability of Detection (POD) of significant turbulence with the POD(EDR = 0) from the 2017-2018 winter data yields a positive True Skill Statistic (TSS) for all the gravity wave forecasts with the highest being 0.69 for Lighthill-Ford. In contrast, since the GTG3 is weak in both POD measures, its TSS = -0.36. We conclude that the gravity wave physical model is much better than any statistical model such as GTG3.