An unsupervised heterogeneous log-based framework for anomaly detection
An unsupervised heterogeneous log-based framework for anomaly detection
: Log analysis is a method to identify intrusions at the host or network level by scrutinizing the log events recorded by the operating systems, applications, and devices. Most work contemplates a single type of log for analysis, leading to an unclear picture of the situation and difficulty in deciding the existence of an intrusion. Moreover, most existing detection methods are knowledge-dependent, i.e. using either the characteristics of an anomaly or the baseline of normal traffic behavior, which limits the detection process to only anomalies based on the acquired knowledge. To discover a wide range of anomalies by scrutinizing various logs, this paper presents a new unsupervised framework, UHAD, which uses a two-step strategy to cluster the log events and then uses a filtering threshold to reduce the volume of events for analysis. The events from heterogeneous logs are assembled together into a common format and are analyzed based on their features to identify anomalies. Clustering accuracy of K-means, expectation-maximization, and farthest first were compared and the impact of clustering was captured in all the subsequent phases. Even though log events pass through several phases in UHAD before being concluded as anomalous, experiments have shown that the selection of the clustering algorithm and the filtering threshold significantly influences the decision. The framework detected the majority of anomalies by relating the events from heterogeneous logs. Specifically, the usage of K-means and expectationmaximization supported the framework to detect an average of 87.26% and 85.24% anomalous events respectively with various subsets.
___
- [1] Yu X, Tang LA, Han J. Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: Proceedings of the 9th IEEE International Conference on Data Mining; December 2009; Miami, FL, USA. New York, NY, USA: IEEE. pp. 617626.
- [2] Chimphlee W, Abdullah AH, Sap MNM, Chimphlee S, Srinoy S. Unsupervised clustering methods for identifying rare events in anomaly detection. In: Proceedings of the 6th International Enformatika Conference; 2628 August 2005; C¸ anakkale, Turkey. pp. 2628.
- [3] Erman J, Arlitt M, Mahanti A. Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data; 2006. New York, NY, USA: ACM. pp. 281286.
- [4] M¨unz G, Li S, Carle G. Traffic anomaly detection using k-means clustering. In: Proceedings of Leistungs-, Zuverl¨assigkeits- und Verl¨asslichkeitsbewertung von Kommunikationsnetzen und Verteilten Systemen 4, GI/ITGWorkshop MMBnet; September 2007; Hamburg, Germany.
- [5] Smith R, Japkowicz N, Dondo M, Mason P. Using unsupervised learning for network alert correlation. Lect Notes Artif Int 2008; 308319.
- [6] Syarif I, Prugel-Bennett A, Wills G. Unsupervised clustering approach for network anomaly detection. Comm Com Inf Sc 2012; 293: 135145.
- [7] Casas P, Mazel J, Owezarski P. Steps towards autonomous network security: unsupervised detection of network attacks. In: 4th IFIP International Conference on New Technologies, Mobility and Security; 2011; Paris, France. pp. 15.
- [8] Abad C, Taylor J, Sengul C, Yurick W, Zhou Y, Rowe KE. Log correlation for intrusion detection: a proof of concept. In: Proceedings of the 19th Annual Computer Security Applications Conference; December 2003; Las Vegas, NV, USA. pp. 255264.
- [9] Yurcik W, Abad C, Hasan R, Saleem M, Sridharan S. UCLog+ : A security data management system for correlating alerts, incidents, and raw data from remote logs. Arxiv Preprint cs/0607111, 2006.
- [10] Li Z, Taylor J, Partridge E, Zhou Y, Yurcik W, Abad C, Barlow JJ, Rosendale J. UCLog: A unified, correlated logging architecture for intrusion detection. In: 12th International Conference on Telecommunication Systems Modeling and Analysis (ICTSM), 2004.
- [11] Song J, Takakura H, Okabe Y, Nakao K. Toward a more practical unsupervised anomaly detection system. Inform Sciences 2013; 231: 414.
- [12] Asif-Iqbal H, Udzir NI, Mahmod R, Ghani AAA. Filtering events using clustering in heterogeneous security logs. Inform Technol J 2011; 10: 798806.
- [13] Panichprecha S. Abstracting and correlating heterogeneous events to detect complex scenarios. PhD Thesis, Queensland University of Technology, Brisbane, Australia, 2009.
- [14] Herrerias J, Gomez R. Log analysis towards an automated forensic diagnosis system. In: International Conference on Availability, Reliability and Security; February 2010; Krakow, Poland. pp. 659664.
- [15] Amiri F, Rezaei Yousefi MM, Lucas C, Shakery A. Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 2011; 34: 11841199.
- [16] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 2009; 11: 1018.
- [17] Sinclair C, Pierce L, Matzner S. An application of machine learning to network intrusion detection. In: Proceedings of the 15th Annual Computer Security Applications Conference (ACSAC 99); 610 December 1999, Phoenix, AZ, USA. New York, NY, USA: IEEE. pp. 371377.
- [18] Barbara D, Wu N, Jajodia S. Detecting novel network intrusions using bayes estimators. In: Proceedings of the First SIAM International Conference on Data Mining (SDM 2001); 2001; Chicago, IL, USA. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics (SIAM). pp. 117.
- [19] Li Y, Wu N, Jajodia S, Wang XS. Enhancing profiles for anomaly detection using time granularities. Journal of Computer Security 2002; 10: 137158.
- [20] Staniford S, Hoagland JA, McAlerney JM. Practical automated detection of stealthy portscans. Journal of Computer Security 2002; 10: 105136.
- [21] Dempster AP, Laird NM, Rubi DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B Met 1977; 39: 138.
- [22] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; 1967; Berkeley, CA, USA. pp. 281297.
- [23] Tjhai GC, Furnell SM, Papadaki M, Clarke NL. A preliminary two-stage alarm correlation and filtering system using SOM neural network and K-means algorithm. Comput Secur 2010; 29: 712723.
- [24] Dasgupta S, Long PM. Performance guarantees for hierarchical clustering. J Comput Syst Sci 2005; 70: 555569.
- [25] Zheng X, Cai Z, Li Q. An experimental comparison of three kinds of clustering algorithms. In: International Conference on Neural Networks and Brain; 1315 October 2005; Beijing, China. New York, NY, USA: IEEE. pp. 767771.
- [26] Kent K, Souppaya M. Guide to Computer Security Log Management. White Paper, NIST Special Publication 800-92, 2006.
- [27] Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE T Syst Man Cy C 2010; 40: 516524.