Critical to successful development and implementation of WRF-Fire is evaluation of the fire behavior it simulates. To test performance of WRF-Fire, 11 fires in Colorado in 2016 were simulated. The fires were chosen based on the availability of observations, specifically having at least two fire perimeter observations no more than 36 hours apart. The HRRR background weather analysis, the WRF weather analysis, and the WRF-Fire outputs were all analyzed for the cases. The Model Evaluation Tools (MET) software package was used to analyze the HRRR performance. To assess the performance of the WRF-Fire forecasts, model output for burned area was compared against observed boundary shapefiles and both contingency statistics and object-oriented calculations were used. During the evaluation process, a number of observation issues presented including a lack of observation consistency, no central location for accessing observations, long periods between observation updates, and inaccuracies in the observed data. To overcome some of these issues, new techniques such as using film capture in a Geographic Information System (GIS) platform are being explored.