Yusuf ALACA, Yuksel CELIK, Sanjay GOEL

Anomaly Detection in Cyber Security with Graph-Based LSTM in Log Analysis

Intrusion detection systems utilize the analysis of log data to effectively detect anomalies. However, detecting anomalies quickly and effectively in large and heterogeneous log data can be challenging. To address this difficulty, this study proposes the GLSTM (Graph-based Long Short-Term Memory) framework, a graph-based deep learning model that analyzes log data to detect cyber-attacks rapidly and effectively. The framework involves standardizing the complex and diverse log data, training this data on an artificial intelligence model, and detecting anomalies. Initially, the complex and diverse log data is transformed into graph data using Node2Vec, enabling efficient and rapid analysis on the artificial intelligence model. Subsequently, these graph data are trained using LSTM (Long Short-Term Memory), Bi-LSTM, and GRU(Gated Recurrent Unit) deep learning algorithms. The proposed framework is tested using Hadoop’s HDFS dataset, collected from different systems and heterogeneous sources, as well as the BGL and IMDB datasets. Experimental results on the selected datasets demonstrate high levels of success.

Keywords:

Anomaly Detection, Graph, Node2Vec Deep Learning, Cyber Security, HDFS,

PDF

___

Ahmed, M., A. N. Mahmood, and M. R. Islam, 2016 A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems 55: 278–288.
Alaca, Y. and Y. Çelik, 2023 Cyber attack detection with qr code images using lightweight deep learning models. Computers & Security 126: 103065.
Church, K. W., 2017 Word2Vec. Natural Language Engineering 23: 155–162.
CSIRO’s Data61, 2018 StellarGraph Machine Learning Library. Demeester, T., T. Rocktäschel, and S. Riedel, 2016 Lifted rule injection for relation embeddings. EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings pp. 1389–1399.
Du, M., F. Li, G. Zheng, and V. Srikumar, 2017 DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the ACM Conference on Computer and Communications Security pp. 1285–1298.
Elbasani, E. and J. D. Kim, 2021 LLAD: Life-Log Anomaly Detection Based on Recurrent Neural Network LSTM. Journal of Healthcare Engineering 2021.
Farzad, A. and T. A. Gulliver, 2019 Log Message Anomaly Detection and Classification Using Auto-B/LSTM and Auto-GRU pp. 1–28.
Gogoi, P., D. K. Bhattacharyya, B. Borah, and J. K. Kalita, 2011 A survey of outlier detection methods in network anomaly identification. The Computer Journal 54: 570–588.
Grover, A. and J. Leskovec, 2016 node2vec: Scalable Feature Learning for Networks . Guo, H., S. Yuan, and X.Wu, 2021 Logbert: Log anomaly detection via bert. In 2021 international joint conference on neural networks (IJCNN), pp. 1–8, IEEE.
He, S., P. He, Z. Chen, T. Yang, Y. Su, et al., 2020 A Survey on Automated Log Analysis for Reliability Engineering. arXiv preprint arXiv:2009.07237 .
Hochreiter, S. and J. Schmidhuber, 1997 Long Short-Term Memory. Neural Computation 9: 1735–1780.
Kipf, T. N. and M. Welling, 2016 SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS. Technical report.
Li, H. and Y. Li, 2020 LogSpy: System Log Anomaly Detection for Distributed Systems. Proceedings - 2020 International Conference on Artificial Intelligence and Computer Engineering, ICAICE 2020 pp. 347–352.
Li, Y., Y. Zheng, H. Zhang, and L. Chen, 2015 Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 1–10.
Lu, S., X. Wei, Y. Li, and L. Wang, 2018 Detecting anomaly in big data system logs using convolutional neural network. Proceedings - IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3 pp. 159–165.
Makanju, A. A. O., A. N. Zincir-Heywood, and E. E. Milios, 2009 Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1255–1264.
Meng,W., Y. Liu, Y. Zhu, S. Zhang, D. Pei, et al., 2019a Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. IJCAI International Joint Conference on Artificial Intelligence 2019-Augus: 4739–4745.
Rodriguez, P., J. Wiles, and J. L. Elman, 1999 A recurrent neural network that learns to count. Connection Science 11: 5–40.
Rong, X., 2014 word2vec Parameter Learning Explained pp. 1–21.
Schindler, T., 2017 Anomaly Detection in Log Data using Graph Databases and Machine Learning to Defend Advanced Persistent Threats. Technical report.
Sigelman, B. H., L. Andr, M. Burrows, P. Stephenson, M. Plakal, et al., 2010 Dapper , a Large-Scale Distributed Systems Tracing Infrastructure. Google Research p. 14.
Specht, D. F., 1990 Probabilistic neural networks. Neural networks 3: 109–118.
Studiawan, H., C. Payne, and F. Sohel, 2017 Graph clustering and anomaly detection of access control log for forensic purposes. Digital Investigation 21: 76–87.
Tripathi, S., R. Mehrotra, V. Bansal, and S. Upadhyay, 2020 Analyzing sentiment using imdb dataset. In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 30–33, IEEE.
Vaarandi, R., 2003 A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations Management (IPOM 2003)(IEEE Cat. No. 03EX764), pp. 119–126, Ieee.
Wang, M., L. Xu, and L. Guo, 2018 Anomaly detection of system logs based on natural language processing and deep learning. 2018 4th International Conference on Frontiers of Signal Processing, ICFSP 2018 pp. 140–144.
Wang, X., D. Wang, Y. Zhang, L. Jin, and M. Song, 2019 Unsupervised learning for log data analysis based on behavior and attribute features. In Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science, pp. 510–518.
Werbos, P. J., 1988 Generalization of backpropagation with application to a recurrent gas market model. Neural networks 1: 339–356.
Yan, X., W. Zhou, Y. Gao, Z. Zhang, J. Han, et al., 2015 PADM: Page rank-based anomaly detection method of log sequences by graph computing. Proceedings of the International Conference on Cloud Computing Technology and Science, CloudCom 2015- Febru: 700–703.