Dynamically updated diversified ensemble-based approach for handling concept drift

Concept drift is the phenomenon where underlying data distribution changes over time unexpectedly. Examining such drifts and getting insight into the executing processes at that instance of time is a big challenge. Prediction models should be capable of handling drifts in scenarios where statistical properties show abrupt changes. Various strategies exist in the literature to deal with such challenging scenarios but the majority of them are limited to the identification of a particular kind of drift pattern. The proposed approach uses online drift detection in a diversified adaptive setting with pruning techniques to formulate a concept drift handling approach, named ensemblebased online diversified drift detection En-ODDD , with an aim to identify the majority of drifts including abrupt, gradual, recurring, mixed, etc. in a single model. En-ODDD is equipped with a dynamically updated ensemble to speed up the adaptability to changing distributions. Unlike prevalent approaches, which do not consider correlations between experts, En-ODDD entails experts using varying randomized subsets of input data. Different levels of sampling having been applied for diversity generation to promote generalization. Prediction accuracy has been used to evaluate the effectiveness of the proposed approach using Massive Online Analysis software and compared with ten state-of-theart algorithms. Experimental results on fifteen benchmark datasets artificial and real-world having up to one million instances depict that En-ODDD outperforms the existing approaches irrespective of nature of drift.

___

  • [1] De Mello RF, Rios RA, Pagliosa PA, Lopes CS. Concept drift detection on social network data using cross recurrence quantification analysis. Chaos: An Interdisciplinary Journal of Nonlinear Science 2018; 28 (8): 085719.
  • [2] Tan G, Zhang P, Liu Q, Liu X, Zhu C et al. Adaptive malicious URL detection: Learning in the presence of concept drifts. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications; New York, USA; 2018. pp. 737-743.
  • [3] Brzezinski D, Stefanowski J. Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks and Learning Systems 2014; 25 (1): 81-94.
  • [4] Ren S, Liao B, Zhu W, Li K. Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences 2018; 430: 261-281.
  • [5] Minku LL, Yao X. DDD: A new ensemble approach for dealing with concept drift. IEEE Transactions on Knowledge and Data Engineering 2012; 24 (4): 619-633.
  • [6] Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F et al. Adaptive random forests for evolving data stream classification. Machine Learning 2017; 106 (9-10): 1469-1495.
  • [7] Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldá R et al. Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams; Berlin, Germany; 2006. pp. 77-86.
  • [8] Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining; Minneapolis, MN, USA; 2007. pp. 443-448.
  • [9] Gama J, Medas P, Castillo G, Rodrigues P. Learning with drift detection. In: 17th Brazilian Symposium on Artificial Intelligence; Sao Luis, Brazil; 2004. pp. 286-295.
  • [10] Gama J, Z̆liobaitė, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Computing Surveys 2014; 46 (4): 44.
  • [11] De Barros RSM, De Carvalho Santos SGT. An overview and comprehensive comparison of ensembles for concept drift. Information Fusion 2019; 52: 213-244.
  • [12] Kolter JZ, Maloof MA. Dynamic weighted majority: an ensemble method for drifting concepts. Journal of Machine Learning Research 2007; 8: 2755-2790.
  • [13] Blum A. Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain. Machine Learning 1997; 26 (1): 5-23.
  • [14] Gonçalves PM Jr, De Barros RSM. RCD: A recurring concept drift framework. Pattern Recognition Letters 2013; 34 (9): 1018-1025.
  • [15] Duda P. On ensemble components selection in data streams scenario with gradual concept-drift. In: International Conference on Artificial Intelligence and Soft Computing; Zakopane, Poland; 2018. pp. 311-320.
  • [16] Santos SG, Barros RS, Gonçalves PM Jr. A differential evolution based method for tuning concept drift detectors in data streams. Information Sciences 2019; 485: 376-393.
  • [17] Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Boston, MA, USA; 2000. pp. 71-80.
  • [18] Hulten G, Spencer L, Domingos P. Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, CA, USA; 2001. pp. 97-106.
  • [19] Sidhu P, Bhatia M. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. International Journal of Machine Learning and Cybernetics 2015; 6 (6): 883-909.
  • [20] Lobo JL, Del Ser J, Bilbao MN, Perfecto C, Salcedo-Sanz S. DRED: An evolutionary diversity generation method for concept drift adaptation in online learning environments. Applied Soft Computing 2018; 68: 693-709.
  • [21] Oza NC. Online bagging and boosting. In: 2005 IEEE International Conference on Systems, Man and Cybernetics; Waikoloa, HI, USA; 2005. pp. 2340-2345.
  • [22] Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks 2011; 22 (10): 1517-1531.
  • [23] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research 2010; 11: 1601-1604.
  • [24] Nishida K, Yamauchi K, Omori T. ACE: Adaptive classifiers-ensemble system for concept-drifting environments. Multiple Classifier Systems 2005; 3541: 176-185.
  • [25] Bifet A, Holmes G, Pfahringer B. Leveraging bagging for evolving data streams. Machine Learning and Knowledge Discovery in Databases 2010; 6321: 135-150.
  • [26] Dems̆ar J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 2006; 1: 1-30.
  • [27] Iman RL, Davenport JM. Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods 1980; 9 (6): 571-595.
  • [28] Malhotra R, Khanna M. An exploratory study for software change prediction in object-oriented systems using hybridized techniques. Automated Software Engineering 2017; 24 (3): 673-717.