# A Study of Feature Extraction Techniques Based on Decision Border Estimate

Title | A Study of Feature Extraction Techniques Based on Decision Border Estimate |

Publication Type | Conference Proceedings |

Year of Conference | 2007 |

Authors | Diamantini C, Potena, D |

Conference Name | COMPUTATIONAL METHODS OF FEATURE SELECTION |

Pagination | 109 - 129 |

Date Published | 2007 |

Publisher | CHAPMAN & HALL/CRC |

Abstract | Feature extraction is the core of methodologies aimed at building new and more expressive features from the existing ones. This representation change typically allows to enlighten characteristics of data that are not immediately evident in the original space. As a consequence, performance can be improved at the expenses of reduced interpretation capability by domain experts. Feature extraction can be considered as a mapping from the original space to a lower dimensional feature space. The mapping can be carried out with respect to different criteria. They can be roughly divided in data representation and data discrimination criteria. In the former case, the goal is to find the set of reduced features which best approximate the original data, so the criteria are based on the minimization of a mean-squared error or distortion measure. One of the best known methods based on this criterion is the Principal Component Analysis (PCA) or Karhunen-Loeve expansion, that calculates eigenvalues and eigenvectors of the data covariance matrix, and defines the mapping as an orthonormal transformation based on the set of eigenvectors corresponding to the highest eigenvalues. The squared error of the transformation is simply the sum of the leftover eigenvalues. PCA is an optimum method for data compression and signal representation however it presents several limitations for discriminating between data belonging to different classes. In particular, for data discrimination, criteria to evaluate the effectiveness of features should be a measure of the class separability. For this task, Bayes error probability is the best criterion to evaluate a feature set. Unfortunately, Bayes error is unknown in general. A family of methods that is frequently used in practice, but that is only indirectly related to Bayes error, is called Discriminant Analysis (DA), based on a family of functions of scatter matrices. In the simplest form, Linear DA (LDA), also known as Canonical Analysis (CA), considers a within-class scatter matrix for each class, measuring the scatter of samples around the respective class mean, the between-class scatter matrix, measuring the scatter of class means around the mixture mean, and finds a transformation that maximizes the between-class scatter and minimizes the within-class scatter, so that the class separability is maximized in the reduced dimensional space. Other approaches use upper bounds of Bayes error, like the Bhattacharyya distance. In Lee and Landgrebe introduced the principle that, in classification tasks, the relevance of features can be measured on the basis of properties of the decision border, the geometrical locus of points of the feature space separating one class from the others. Following this approach, some authors proposed the use of Artificial Neural Networks (ANNs) to estimate the unknown decision border. In early works, authors suggested the use of Multi-Layer Perceptron, that is the most widely used type of feedforward ANN, consisting of multiple layers of interconnected neurons. More recently, it was proposed the use of SVMs targeted to the accurate estimate of the optimal decision border. We propose a truly bayesian approach to feature extraction for classification, that is based on an appropriately trained Labeled Vector Quantizer (LVQ). We call the approach truly bayesian since the LVQ is trained with the Bayes risk weighted Vector Quantization (BVQ) learning algorithm, which is, at the best of our knowledge, the only learning algorithm based on the minimization of the misclassification risk. Under this truly classification-based algorithm, an LVQ moves towards a locally optimal linear approximation of the bayesian decision border. In this chapter we present these approaches and comparison among them. |