Online feature selection and classification with incomplete data

This paper presents a classification system in which learning, feature selection, and classification for incomplete data are simultaneously carried out in an online manner. Learning is conducted on a predefined model including the class-dependent mean vectors and correlation coefficients, which are obtained by incrementally processing the incoming observations with missing features. A nearest neighbor with a Gaussian mixture model, whose parameters are also estimated from the trained model, is used for classification. When a testing observation is received, the algorithm discards the missing attributes on the observation and ranks the available features by performing feature selection on the model that has been trained so far. The developed algorithm is tested on a benchmark dataset. The effect of missing features for online feature selection and classification are discussed and presented. The algorithm easily converges to the stable state of feature selection with similar accuracy results as those when using the complete and incomplete feature set with up to 50% missing data.

Online feature selection and classification with incomplete data

This paper presents a classification system in which learning, feature selection, and classification for incomplete data are simultaneously carried out in an online manner. Learning is conducted on a predefined model including the class-dependent mean vectors and correlation coefficients, which are obtained by incrementally processing the incoming observations with missing features. A nearest neighbor with a Gaussian mixture model, whose parameters are also estimated from the trained model, is used for classification. When a testing observation is received, the algorithm discards the missing attributes on the observation and ranks the available features by performing feature selection on the model that has been trained so far. The developed algorithm is tested on a benchmark dataset. The effect of missing features for online feature selection and classification are discussed and presented. The algorithm easily converges to the stable state of feature selection with similar accuracy results as those when using the complete and incomplete feature set with up to 50% missing data.

___

  • Jepson A, Fleet D, El-Maraghi T. Robust online appearance models for visual tracking. IEEE T Pattern Anal 2008; 25: 1296–1311.
  • Lehtonen J, Jyl¨anki P, Kauhanen L, Sams M. Online classification of single EEG trials during finger movements. IEEE T Bio-Med Eng 2008; 55: 713–720.
  • Slavakis K, Theodoridis S, Yamada I. Online kernel-based classification using adaptive projection algorithms. IEEE T Signal Process 2008; 56: 2781–2796.
  • Hall P, Marshall D, Martin R. Incremental Eigenanalysis for classification. In: 1998 British Machine Vision
  • Conference; Southampton, UK. pp. 286–295. Pang S, Ozawa S, Kasabov N. One-pass incremental membership authentication by face classification. In: 2004
  • International Conference on Bioinformatics and its Applications; Fort Lauderdale, FL, USA. pp. 155–161. Yang C, Zhou J. Non-stationary data sequence classification using online class priors estimation, Pattern Recogn 2008; 41: 2656–2664.
  • Collins RT, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE T Pattern Anal 2005; 27: 1631–1643.
  • Grabner H, Bischof H. On-line boosting and vision. In: IEEE 2006 Computer Society Conference on Computer
  • Vision and Pattern Recognition; 17–22 June 2006; New York, NY, USA. pp. 260–267.
  • Kalkan H, Ceti¸sli B. Online feature selection and classification. In: IEEE 2011 International Conference on Acoustics,
  • Speech and Signal Processing; 22–27 May 2011; Prague, Czech Republic. pp. 2124–2127.
  • Wang LL, Qu JJ, Siong X, Xianjun H, Yong X, Nianzeng C. A new method for retrieving band 6 of Aqua MODIS. IEEE Geosci Remote 2006; 3: 267–270.
  • Williams D, Liao X, Xue Y, Carin L, Krishnapuram B. On classification with incomplete data, IEEE T Pattern Anal 2007; 29: 427–436.
  • Laencina PJG, G´omez JLS, Vidal ARF. Pattern classification with missing data: a review. Neural Comput Appl 2010; 19: 263–282.
  • Tsuda K, Akaho S, Asai K. The EM algorithm for kernel matrix completion with auxiliary data. J Mach Learn Res 2003; 4: 67–81.
  • Mustafa YT, Stein A, Tolpekin V. Improving forest growth estimates using a Bayesian network approach. Pho- togramm Eng Rem S 2012; 78: 45–45.
  • Kaya Y, Yesilova A, Almali MN. An application of expectation and maximization, multiple imputation and neural network methods for missing value. WASJ 2010; 9: 561–566.
  • Salberg AB. Land Cover classification of cloud-contaminated multitemporal high-resolution images. IEEE T Geosci Remote 2011; 49: 377–387.
  • Marlin BM. Missing data problems in machine learning, PhD, University of Toronto, Toronto, Canada, 2008.
  • Sharpe PK, Solly RJ. Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 1995; 3: 73–77.
  • Juszczak P, Duin RPW. Combining one-class classifiers to classify missing data. Lecture Notes Comput Sc 2004; 3077: 92–101.
  • Clark P, Niblett T. The CN2 induction algorithm. Mach Learn 1989; 3: 261–283.
  • Ishibuchi H, Miyazaki A, Kwon K, Tanaka H. Learning from incomplete training data with missing values and medical application. In: IEEE International Joint Conference on Neural Networks; 25–29 October 1993; Nagoya, Japan. pp. 1871–1874.
  • Lim CP, Leong JH, Kuan MM. A hybrid neural network system for pattern classification tasks with missing features. IEEE T Pattern Anal 2005; 27: 648–65.
  • Shivaswamy PK, Bhattacharyya C, Smola AJ. A second order cone programming formulation for classifying missing data. J Mach Learn Res 2006; 7:1283–1314.
  • Chechik G, Heitz G, Elidan H, Abbeel P, Koller D. Max-margin classification with incomplete data. J Mach Learn Res 2008; 9: 1–21.
  • Webb AR. Statistical Pattern Recognition. 2nd ed. London, UK: Wiley, 2002.
  • Drygajlo A, El-Maliki M. Speaker verification in noisy environments with combined spectral subtraction and missing feature theory. In: IEEE 1998 International Conference on Acoustics, Speech and Signal Processing; 12–15 May 1998; Seattle, WA, USA. pp. 121–124.
  • Lin TI. On fast supervised learnin.g for normal mixture models with missing information. Pattern Recogn 2006; 39: 1177–1187.