Anomaly Detection in Cyber Security with Graph-Based LSTM in Log Analysis
Anomaly Detection in Cyber Security with Graph-Based LSTM in Log Analysis
Intrusion detection systems utilize the analysis of log data to effectively detect anomalies. However, detecting anomalies quickly and effectively in large and heterogeneous log data can be challenging. To address this difficulty, this study proposes the GLSTM (Graph-based Long Short-Term Memory) framework, a graph-based deep learning model that analyzes log data to detect cyber-attacks rapidly and effectively. The framework involves standardizing the complex and diverse log data, training this data on an artificial intelligence model, and detecting anomalies. Initially, the complex and diverse log data is transformed into graph data using Node2Vec, enabling efficient and rapid analysis on the artificial intelligence model. Subsequently, these graph data are trained using LSTM (Long Short-Term Memory), Bi-LSTM, and GRU(Gated Recurrent Unit) deep learning algorithms. The proposed framework is tested using Hadoop’s HDFS dataset, collected from different systems and heterogeneous sources, as well as the BGL and IMDB datasets. Experimental results on the selected datasets demonstrate high levels of success.
___
- Ahmed, M., A. N. Mahmood, and M. R. Islam, 2016 A survey
of anomaly detection techniques in financial domain. Future
Generation Computer Systems 55: 278–288.
- Alaca, Y. and Y. Çelik, 2023 Cyber attack detection with qr code
images using lightweight deep learning models. Computers &
Security 126: 103065.
- Church, K. W., 2017 Word2Vec. Natural Language Engineering 23:
155–162.
- CSIRO’s Data61, 2018 StellarGraph Machine Learning Library.
Demeester, T., T. Rocktäschel, and S. Riedel, 2016 Lifted rule injection
for relation embeddings. EMNLP 2016 - Conference on
Empirical Methods in Natural Language Processing, Proceedings
pp. 1389–1399.
- Du, M., F. Li, G. Zheng, and V. Srikumar, 2017 DeepLog: Anomaly
detection and diagnosis from system logs through deep learning.
Proceedings of the ACM Conference on Computer and Communications
Security pp. 1285–1298.
- Elbasani, E. and J. D. Kim, 2021 LLAD: Life-Log Anomaly Detection
Based on Recurrent Neural Network LSTM. Journal of
Healthcare Engineering 2021.
- Farzad, A. and T. A. Gulliver, 2019 Log Message Anomaly Detection
and Classification Using Auto-B/LSTM and Auto-GRU pp.
1–28.
- Gogoi, P., D. K. Bhattacharyya, B. Borah, and J. K. Kalita, 2011 A
survey of outlier detection methods in network anomaly identification.
The Computer Journal 54: 570–588.
- Grover, A. and J. Leskovec, 2016 node2vec: Scalable Feature Learning
for Networks .
Guo, H., S. Yuan, and X.Wu, 2021 Logbert: Log anomaly detection
via bert. In 2021 international joint conference on neural networks
(IJCNN), pp. 1–8, IEEE.
- He, S., P. He, Z. Chen, T. Yang, Y. Su, et al., 2020 A Survey on Automated
Log Analysis for Reliability Engineering. arXiv preprint
arXiv:2009.07237 .
- Hochreiter, S. and J. Schmidhuber, 1997 Long Short-Term Memory.
Neural Computation 9: 1735–1780.
- Kipf, T. N. and M. Welling, 2016 SEMI-SUPERVISED CLASSIFICATION
WITH GRAPH CONVOLUTIONAL NETWORKS.
Technical report.
- Li, H. and Y. Li, 2020 LogSpy: System Log Anomaly Detection
for Distributed Systems. Proceedings - 2020 International Conference
on Artificial Intelligence and Computer Engineering,
ICAICE 2020 pp. 347–352.
- Li, Y., Y. Zheng, H. Zhang, and L. Chen, 2015 Traffic prediction
in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL
International Conference on Advances in Geographic Information
Systems, pp. 1–10.
- Lu, S., X. Wei, Y. Li, and L. Wang, 2018 Detecting anomaly in
big data system logs using convolutional neural network. Proceedings
- IEEE 16th International Conference on Dependable,
Autonomic and Secure Computing, IEEE 16th International Conference
on Pervasive Intelligence and Computing, IEEE 4th International
Conference on Big Data Intelligence and Computing
and IEEE 3 pp. 159–165.
- Makanju, A. A. O., A. N. Zincir-Heywood, and E. E. Milios, 2009 Clustering event logs using iterative partitioning. In Proceedings
of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 1255–1264.
- Meng,W., Y. Liu, Y. Zhu, S. Zhang, D. Pei, et al., 2019a Loganomaly:
Unsupervised detection of sequential and quantitative anomalies
in unstructured logs. IJCAI International Joint Conference
on Artificial Intelligence 2019-Augus: 4739–4745.
- Rodriguez, P., J. Wiles, and J. L. Elman, 1999 A recurrent neural
network that learns to count. Connection Science 11: 5–40.
- Rong, X., 2014 word2vec Parameter Learning Explained pp. 1–21.
- Schindler, T., 2017 Anomaly Detection in Log Data using Graph
Databases and Machine Learning to Defend Advanced Persistent
Threats. Technical report.
- Sigelman, B. H., L. Andr, M. Burrows, P. Stephenson, M. Plakal,
et al., 2010 Dapper , a Large-Scale Distributed Systems Tracing
Infrastructure. Google Research p. 14.
- Specht, D. F., 1990 Probabilistic neural networks. Neural networks
3: 109–118.
- Studiawan, H., C. Payne, and F. Sohel, 2017 Graph clustering and
anomaly detection of access control log for forensic purposes.
Digital Investigation 21: 76–87.
- Tripathi, S., R. Mehrotra, V. Bansal, and S. Upadhyay, 2020 Analyzing
sentiment using imdb dataset. In 2020 12th International
Conference on Computational Intelligence and Communication Networks
(CICN), pp. 30–33, IEEE.
- Vaarandi, R., 2003 A data clustering algorithm for mining patterns
from event logs. In Proceedings of the 3rd IEEE Workshop on IP
Operations Management (IPOM 2003)(IEEE Cat. No. 03EX764), pp.
119–126, Ieee.
- Wang, M., L. Xu, and L. Guo, 2018 Anomaly detection of system
logs based on natural language processing and deep learning.
2018 4th International Conference on Frontiers of Signal Processing,
ICFSP 2018 pp. 140–144.
- Wang, X., D. Wang, Y. Zhang, L. Jin, and M. Song, 2019 Unsupervised
learning for log data analysis based on behavior and
attribute features. In Proceedings of the 2019 International Conference
on Artificial Intelligence and Computer Science, pp. 510–518.
- Werbos, P. J., 1988 Generalization of backpropagation with application
to a recurrent gas market model. Neural networks 1:
339–356.
- Yan, X., W. Zhou, Y. Gao, Z. Zhang, J. Han, et al., 2015 PADM:
Page rank-based anomaly detection method of log sequences by
graph computing. Proceedings of the International Conference
on Cloud Computing Technology and Science, CloudCom 2015-
Febru: 700–703.