This talk is part 2 of two highlighting two different machine learning approaches to classification. Here we focus on Support Vector Machine (SVM) classification systems using Linear and Radial Bias Function (RBF) kernels. The general method for training a SVM consists of three steps. First, labeled data is partitioned into training and testing data at a 2-to-1 ratio. To prepare the training data for learning, Principal Component Analysis (PCA) is performed on data scaled independently in each variable to zero mean and unit variance. Only the top 50 components are kept, lowering the dimensionality by a factor of 5. Next, Cross Validation selects the optimal hyper-parameters for each SVM kernel and the optimal SVMs are trained on the prepared data through supervised learning. Lastly, each trained model classifies the testing data, which is transformed in the same manner as the training data, building a confusion matrix from the resulting predictions, from which a Peirce Skill Score is easily computed.
Presently, we find that both SVM kernels produce Peirce scores in the mid 0.50’s. SVMs with linear kernels score up to 0.57, while SVMs with RBF kernels score up to 0.53. Additionally, performing PCA before training a model greatly accelerates the training process and yields a much better Peirce score (+0.18 on average). Future work may investigate Deep Learning techniques such as Neural Networks to see if they produce higher scores.