For this paper we will use several thousand real-time-reporting raingage sites in the United States to investigate the latter of these sources of variability. First, we attempt to measure it directly using bootstrapping methods to create a distribution of scores by resampling the total observation set numerous times. These results will be compared to the variability of the actual observations themselves and to verification scores based on alternate precipitation data (e.g., radar precipitation estimates and independent retrospective gage data). Second, recognizing that an important use for scores is as a test of model performance on different days with different precipitation regimes, we examine how various scores and choices of verification data affect the relative ranking of scores for individual days during a several-month period. For both of these investigations, we will compute scores using precipitation forecasts of the Rapic Update Cycle (RUC) and Eta forecast models verified against different combinations of individual raingage stations and also against raingage-derived analyses computed with these station combinations. Scores tested will include the Bias and the Equitable Skill Statistic, both of which are now routinely computed for the RUC by FSL's Real-Time Verification System (RTVS).