In our work, neural networks (NN) methods have been developed for classifying and forecasting observed precipitation type at the ground. NN can work as universal approximators of non-linear functions and, consequently, can be used in assessing the dynamics of such systems. Among the complex systems, the neural networks (NN) have become a useful tool either where correct phenomenological models are not available or when uncertainty in process model reproduction and input data complicates application of deterministic modelling. The NN parameters were obtained by a training procedure based on the use of an efficient unconstrained minimization algorithm.
For the NN simulations, the optimization of the input patterns is a strategically point. The data optimization process are necessary in order to select the patterns and variables that having a high meaning for explaining data variability related to NN model performance.
Methodologies
First of all, for this competition, we examined bivariate frequency distributions of each variable versus target (observed precipitation type). We observed that bivariate frequency distributions for the temperature and the components of the wind show a significant influence on the target variable.
Then, we applied a novel approach that consists of a combination of optimizing Neural Net architecture in conjunction with:
1. Data mining technique for the optimisation of patterns to improve the training of net
2. Cluster Analysis employed as a pre-processor to simplify the characterisation of the dataset.
Data mining constitutes a powerful tool for investigating complex systems, such as meteorological problems, for analyzing large and complicated data, for extracting implicit and potentially useful knowledge from complex dataset and for underlining processes or relationship that are not completely understandable. An important issue of data mining is data pre-processing that cleans selected data, ensures quality of data, optimizes computational efficiency of data elaboration, generates real-time mining results, enhances the method's efficiency and plays an important role in data description by identifying main variables and structures in the dataset. We applied a data pre-processing methodology both in the variables and patterns space.
In particular, in the pattern space we apply the resampling technique to improve the weights of patterns related to outliers and generally improves the NNs training. It allows to obtain a very high number of samples starting from the initial dataset and to improve input information for NN. The optimal number of sample depends by different factors, in particular from distribution of starting patterns.
The second way of data pre-processing is generated by using cluster analysis that is an important technique used in discovering some inherent structure present in data and does not require further assumptions or a priori knowledge. The purpose for the partitioning of a dataset of objects into k separate clusters is to find clusters whose members show a high degree of similarity among themselves but dissimilarity with the members of other clusters. In this way, it is possible to generate a small number of groups to represent (summarize) the dataset.
At the end of pre-processing step, we applied Neural Network model and the most suitable architectures are considered to be the Multi Layer Perceptron (MLP).
Utilizing data pre-processing, it is possible to obtain more accurate and meaning results because the NN input data have a high reliability. During NN training phase, a large number of input variables can provide an accurate description of the problem being considered, but yields over-parameterized model and requires more computational processing time and often more data for an effective understanding of the relationship between inputs and outputs.
For neural network is one of the most important tasks that should be solved in order to achieve an high degree of accuracy during the generalization phase of the model, with a low number of hidden units and as a consequence, a minimum computational complexity. In particular, with the resampling technique (computer-intensive methods), is possible to recalibrate the NN.
Results and Discussions
The adopted model shows good performances. Our results based on data mining shows, a multi-category form of the Pierce Skill Score index for observed precipitation type ranged between 0.65 to 0.85 in the training phase. The goodness of the observed precipitation classified is strongly dependent by selection of the pattern during the training phase and the results are related with the statistical distribution of the Input/Output data set.
Simulations based on cluster analysis results demonstrate that this method is feasible and effective, resulting in a substantial reduction of data input requirement.
Supplementary URL: