Havadan Karaya Görev Yapan İHA Filosu için Derin Pekiştirmeli Öğrenme Tabanlı İşbirlikçi Beka Optimizasyonu

Çalışma kapsamında hasıma ait radar ve silah sistemlerinin bulunduğu bir ortamda operasyon yapan İHA takımının işbirlikçi strateji geliştirmesine odaklanılmıştır. Hasım savunma sisteminin benzetimini yapmak için Markov modelleri sunulmuştur. İlgili modeller radar sisteminin tespit ve takip olasılıklarının üretebilmekte ve hasım ortamında uçuş yapan hava araçlarının çoklu atış bekalarını hesaplayabilmektedir. Bir derin pekiştirmeli öğrenme metodu olan Proksimal Politika Algoritması vasıtasıyla bir işbirlikçi strateji geliştirme yöntemi sunulmuştur. Önerilen pekiştirmeli öğrenme yapısıyla eğitimin gerçekleştirilmesi ardılında İHA takımının rakibin zayıflığını kullanarak takım bekasını en iyilerken işbirlikçi strateji geliştirebildiği gösterilmiştir.

Deep Reinforcement Learning-based Cooperative Survivability Maximization for a UAV Fleet on an Air-to-Ground Mission

This study focuses on the cooperative strategy development of a UAV team that operates in a hostile environment in which the radar and weapon systems try to track and eliminate them. To simulate the hostile defense system, we present Markov models that generate the detecting and tracking probabilities of a radar system, and calculate the multiple-shot survivability of air vehicles that fly within the hostile environment. A cooperative strategy development procedure is presented based on proximal policy optimization algorithm, which is a deep reinforcement learning method. It is shown that the UAV team can develop cooperative strategies by exploiting enemy’s weakness to maximize team survivability in an air-to-ground mission after training with the proposed reinforcement learning scheme.

___

  • [1] R. E. Ball, The fundamentals of aircraft combat survivability analysis and design. Reston, VA: AIAA (American Institute of Aeronautics & Astronautics), 2003.
  • [2] X. Wang, B.-F. Song, and Y.-F. Hu, Analytic model for aircraft survivability assessment of a one-on-one engagement, Journal of Aircraft 46(1), 223–229 (2009).
  • [3] H. E. Konokman, A. Kayran, and M. Kaya, "Analysis of aircraft survivability against fragmenting warhead threat," In 55thAIAA/ASMe/ASCE/AHS/SC Structures, Structural Dynamics and Materials Conference, 2014. pp. 0355.
  • [4] T. Erlandsson, and L. Niklasson, "A five states survivability model for missions with ground-to-air threats," In Modeling and Simulation for Defense Systems and Applications VIII, 2013. pp. 875207.
  • [5] V. Roberge, M. Tarbouchi, and G. Labonte, Comparison of parallel genetic algorithm and particle swarm optimization for real-time uav path planning, IEEE Transactions on Industrial Informatics 9(1), 132-141 (2013).
  • [6] Z. Peng, B. Li, X. Chen, and J. Wu, "Online route planning for uav based on model predictive control and particle swarm optimization algorithm," Proceedings of the 10th World Congress on Intelligent Control and Automation (IEEE), 2012. pp. 397–401.
  • [7] R. W. Beard, T. W. McLain, M. A. Goodrich, and E. P. Anderson, Coordinated target assignment and intercept for unmanned air vehicles, IEEE Trans. on Robotics and Automation 18(6), 911–922 (2002).
  • [8] W. Xinzeng, C. Linlin, L. Junshan, and Y. Ning, "Route planning for unmanned aerial vehicle based on threat probability and mission time restriction," In Second IITA International Conference on Geoscience and Remote Sensing (IEEE), 2010. pp. 27-30.
  • [9] B. Başpınar, and E. Koyuncu. "Survivability Based Optimal Air Combat Mission Planning with Reinforcement Learning," IEEE Conference on Control Technology and Applications (CCTA), 2018. pp. 664-669.
  • [10] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT press, 1998.
  • [11] R. N. Lothes, M. B. Szymanski, and R. G. Wiley, Radar vulnerability to jamming. Norwood, MA: Artech House, 1990.
  • [12] B. Baspinar, and E. Koyuncu. "Differential Flatness-based Optimal Air Combat Maneuver Strategy Generation," AIAA SciTech Forum and Exposition, Modelling and Simulation Technologies, 2019.
  • [13] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv.org, Aug. 28, 2017. [Online].Available: https://arxiv.org/abs/1707.06347. [Accessed: Oct. 27, 2021].