An unsupervised heterogeneous log-based framework for anomaly detection

An unsupervised heterogeneous log-based framework for anomaly detection

: Log analysis is a method to identify intrusions at the host or network level by scrutinizing the log events recorded by the operating systems, applications, and devices. Most work contemplates a single type of log for analysis, leading to an unclear picture of the situation and difficulty in deciding the existence of an intrusion. Moreover, most existing detection methods are knowledge-dependent, i.e. using either the characteristics of an anomaly or the baseline of normal traffic behavior, which limits the detection process to only anomalies based on the acquired knowledge. To discover a wide range of anomalies by scrutinizing various logs, this paper presents a new unsupervised framework, UHAD, which uses a two-step strategy to cluster the log events and then uses a filtering threshold to reduce the volume of events for analysis. The events from heterogeneous logs are assembled together into a common format and are analyzed based on their features to identify anomalies. Clustering accuracy of K-means, expectation-maximization, and farthest first were compared and the impact of clustering was captured in all the subsequent phases. Even though log events pass through several phases in UHAD before being concluded as anomalous, experiments have shown that the selection of the clustering algorithm and the filtering threshold significantly influences the decision. The framework detected the majority of anomalies by relating the events from heterogeneous logs. Specifically, the usage of K-means and expectationmaximization supported the framework to detect an average of 87.26% and 85.24% anomalous events respectively with various subsets.

___

  • [1] Yu X, Tang LA, Han J. Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: Proceedings of the 9th IEEE International Conference on Data Mining; December 2009; Miami, FL, USA. New York, NY, USA: IEEE. pp. 617–626.
  • [2] Chimphlee W, Abdullah AH, Sap MNM, Chimphlee S, Srinoy S. Unsupervised clustering methods for identifying rare events in anomaly detection. In: Proceedings of the 6th International Enformatika Conference; 26–28 August 2005; C¸ anakkale, Turkey. pp. 26–28.
  • [3] Erman J, Arlitt M, Mahanti A. Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data; 2006. New York, NY, USA: ACM. pp. 281–286.
  • [4] M¨unz G, Li S, Carle G. Traffic anomaly detection using k-means clustering. In: Proceedings of Leistungs-, Zuverl¨assigkeits- und Verl¨asslichkeitsbewertung von Kommunikationsnetzen und Verteilten Systemen 4, GI/ITGWorkshop MMBnet; September 2007; Hamburg, Germany.
  • [5] Smith R, Japkowicz N, Dondo M, Mason P. Using unsupervised learning for network alert correlation. Lect Notes Artif Int 2008; 308–319.
  • [6] Syarif I, Prugel-Bennett A, Wills G. Unsupervised clustering approach for network anomaly detection. Comm Com Inf Sc 2012; 293: 135–145.
  • [7] Casas P, Mazel J, Owezarski P. Steps towards autonomous network security: unsupervised detection of network attacks. In: 4th IFIP International Conference on New Technologies, Mobility and Security; 2011; Paris, France. pp. 1–5.
  • [8] Abad C, Taylor J, Sengul C, Yurick W, Zhou Y, Rowe KE. Log correlation for intrusion detection: a proof of concept. In: Proceedings of the 19th Annual Computer Security Applications Conference; December 2003; Las Vegas, NV, USA. pp. 255–264.
  • [9] Yurcik W, Abad C, Hasan R, Saleem M, Sridharan S. UCLog+ : A security data management system for correlating alerts, incidents, and raw data from remote logs. Arxiv Preprint cs/0607111, 2006.
  • [10] Li Z, Taylor J, Partridge E, Zhou Y, Yurcik W, Abad C, Barlow JJ, Rosendale J. UCLog: A unified, correlated logging architecture for intrusion detection. In: 12th International Conference on Telecommunication Systems – Modeling and Analysis (ICTSM), 2004.
  • [11] Song J, Takakura H, Okabe Y, Nakao K. Toward a more practical unsupervised anomaly detection system. Inform Sciences 2013; 231: 4–14.
  • [12] Asif-Iqbal H, Udzir NI, Mahmod R, Ghani AAA. Filtering events using clustering in heterogeneous security logs. Inform Technol J 2011; 10: 798–806.
  • [13] Panichprecha S. Abstracting and correlating heterogeneous events to detect complex scenarios. PhD Thesis, Queensland University of Technology, Brisbane, Australia, 2009.
  • [14] Herrerias J, Gomez R. Log analysis towards an automated forensic diagnosis system. In: International Conference on Availability, Reliability and Security; February 2010; Krakow, Poland. pp. 659–664.
  • [15] Amiri F, Rezaei Yousefi MM, Lucas C, Shakery A. Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 2011; 34: 1184–1199.
  • [16] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 2009; 11: 10–18.
  • [17] Sinclair C, Pierce L, Matzner S. An application of machine learning to network intrusion detection. In: Proceedings of the 15th Annual Computer Security Applications Conference (ACSAC ’99); 6–10 December 1999, Phoenix, AZ, USA. New York, NY, USA: IEEE. pp. 371–377.
  • [18] Barbara D, Wu N, Jajodia S. Detecting novel network intrusions using bayes estimators. In: Proceedings of the First SIAM International Conference on Data Mining (SDM 2001); 2001; Chicago, IL, USA. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics (SIAM). pp. 1–17.
  • [19] Li Y, Wu N, Jajodia S, Wang XS. Enhancing profiles for anomaly detection using time granularities. Journal of Computer Security 2002; 10: 137–158.
  • [20] Staniford S, Hoagland JA, McAlerney JM. Practical automated detection of stealthy portscans. Journal of Computer Security 2002; 10: 105–136.
  • [21] Dempster AP, Laird NM, Rubi DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B Met 1977; 39: 1–38.
  • [22] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; 1967; Berkeley, CA, USA. pp. 281–297.
  • [23] Tjhai GC, Furnell SM, Papadaki M, Clarke NL. A preliminary two-stage alarm correlation and filtering system using SOM neural network and K-means algorithm. Comput Secur 2010; 29: 712–723.
  • [24] Dasgupta S, Long PM. Performance guarantees for hierarchical clustering. J Comput Syst Sci 2005; 70: 555–569.
  • [25] Zheng X, Cai Z, Li Q. An experimental comparison of three kinds of clustering algorithms. In: International Conference on Neural Networks and Brain; 13–15 October 2005; Beijing, China. New York, NY, USA: IEEE. pp. 767–771.
  • [26] Kent K, Souppaya M. Guide to Computer Security Log Management. White Paper, NIST Special Publication 800-92, 2006.
  • [27] Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE T Syst Man Cy C 2010; 40: 516–524.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Fast and accurate semiautomatic haptic segmentation of brain tumor in 3D MRI images

Erhan İlhan KONUKSEVEN, Adnan ALTUN, Masoud LATIFI NAVID, Mustafa DOĞAN, Murat BİLEN

Source detection and propagation of equal frequency voltage flicker in nonradial power system

Abdolmajid DEJAMKHOOY, Ali DASTFAN, Alireza AHMADYFARD

A simple hybrid method for segmenting vessel structures in retinal fundus images

Cemal KÖSE

An analytical formulation with ill-conditioned numerical scheme and its remedy: scattering by two circular impedance cylinders

Fatih DİKMEN, Emrah SEVER, Olga Alexandrovna SUVOROVA, Yury Alexandrovich TUCHKIN

A generalized design method for multifunction converters used in a photovoltaic system

Trung Nhan NGUYEN, An LUO

Reinforcement learning-based mobile robot navigation

Ceyda Nur ÖZTÜRK, Nihal ALTUNTAŞ, Erkan İMAL, Nahit EMANET

Modeling and control of a doubly fed induction generator with a disturbance observer: a stator voltage oriented approach

Metin GÖKAŞAN, Edin GOLUBOVIC, Asıf SABANOVIC, Seta BOGOSYAN, Eşref Emre ÖZSOY

A facial component-based system for emotion classification

Elena SÖNMEZ, Songül ALBAYRAK

Identifying acquisition devices from recorded speech signals using wavelet-based features

Ömer ESKİDERE

Classification of short-circuit faults in high-voltage energy transmission line using energy of instantaneous active power components-based common vector approach

Mehmet YUMURTACI, Gökhan GÖKMEN, Semih ERGİN, Osman KILIÇ, Çağrı KOCAMAN