GCMs (CGCMs) is assessed and benchmarked with skill of an empirical model, a Linear Inverse
Model (LIM). Both systems produce sea surface temperature (SST) and sea surface height (SSH)
forecasts. For seasonal forecast leads, the CGCM ensemble mean hindcasts come from the North
American Multimodel Ensemble (NMME), while for interannual-to-decadal leads the CMIP5
hindcasts are analyzed. The LIM, constructed from near global (60oS-65oN) observed monthly
anomalies during the period 1961-2010, produces forecasts from 1 month to 9 years lead.
The LIM skill is comparable to or better than the CGCM ensemble mean, as well as local
univariate AR(1) process, at all timescales and all locations. Notably, the LIM is significantly
more skillful than the CGCM ensemble mean over the extratropics, especially at longer forecast
lead. For example, significant skill in the Pacific Decadal Oscillation (PDO), in both PDO phase
and amplitude, up to 6-9 yr lead in LIM are well compared to relatively low insignificant skills in
Analysis of the LIM suggests that possible error in representing the SST-SSH coupling, rather
than uncertainty in forecast initialization, is a major cause of reduced SST and SSH skill in the
CGCMs. This also suggests that reducing this model error should improve model prediction skill
of seasonal-to-decadal SST and SSH anomalies.