First, an index is proposed which measures the disparity between the nonlinear principal components u and u′ for a data point x and its nearest neighbor x′. This index I = 1 − C (C being the Spearman rank correlation between u and u′), tends to increase with overfitted solutions. Among NLPCA models with various amounts of flexibility, the one which minimizes the information criterion H (= MSE times I) automatically selects the model with the right amount of flexibility. Tests are performed using autoassociative neural networks for NLPCA on synthetic and real climate data (including the tropical Pacific sea surface temperature and the North American winter surface air temperature) with very good results. This information criterion also automatically chooses between using an open or a closed curve fit for a dataset.
Supplementary URL: