Kubilay DEMİR, Halil YETGİN, Erdal AKIN

Multiagent Q-learning based UAV trajectory planning for effective situational awareness

In the event of a natural disaster, arrival time of the search and rescue (SAR) teams to the affected areas is of vital importance to save the life of the victims. In particular, when an earthquake occurs in a geographically large area, reconnaissance of the debris within a short-time is critical for conducting successful SAR missions. An effective and quick situational awareness in postdisaster scenarios can be provided via the help of unmanned aerial vehicles (UAVs). However, off-the-shelf UAVs suffer from the limited communication range as well as the limited airborne duration due to battery constraints. If telecommunication infrastructure is destroyed in such a disaster, maximum coverage to be monitored by a ground station (GS) using UAVs is limited to a single UAV’s wireless coverage regardless of how many UAVs are deployed. Additionally, performing a blind search within the affected area could induce significant delays in SAR missions and thus leading to inefficient use of the limited battery energy. To address these issues, we develop a multiagent Q-learning based trajectory planning algorithm that maintains all-time connectivity towards the GS in a multihop manner and enables UAVs to observe as many critical areas (highly populated areas) as possible. The comprehensive experimental results demonstrate that the proposed multiagent Q-learning algorithm is capable of attaining UAV trajectories that can cover significantly larger portions of the critical areas summing up to 43% than that of the existing algorithms, such as the extended versions of Monte Carlo, greedy and random algorithms.

PDF

___

[1] Shamsoshoara A, Afghah F, Razi A, Mousavi S, Ashdown J et al. An autonomous spectrum management scheme for unmanned aerial vehicle networks in disaster relief operations. IEEE Access 2020; 2: 58064-58079. doi: 10.1109/AC2577 AKIN et al./Turk J Elec Eng & Comp Sci CESS.2020.2982932
[2] Ebrahimi D, Sharafeddine S, Ho P, Assi C. Autonomous UAV Trajectory for Localizing Ground Objects: A Reinforcement Learning Approach. IEEE Transactions on Mobile Computing 2020; 20 (4): 1-1. doi: 10.1109/TMC.2020.2966989
[3] Erdelj M, Natalizio E, Chowdhury KR, Akyildiz IF. Help from the sky: leveraging UAVs for disaster management. IEEE Pervasive Computing 2017; 16 (1): 24-32. doi: 10.1109/MPRV.2017.11
[4] Bekmezci İ, Sahingoz OK, Temel S. Flying ad-hoc networks (FANETs): a survey. Ad Hoc Networks 2013; 11 (3): 1254-1270. doi: 10.1016/j.adhoc.2012.12.004
[5] Hayat S, Yanmaz E, Bettstetter C, Brown TX. Multi-objective drone path planning for search and rescue with quality-of-service requirements. Autonomous Robots 2020; 44 (7): 1183-1198. doi: 10.1007/s10514-020-09926-9
[6] Yanmaz E, Yahyanejad S, Rinner B, Hellwagner H, Bettstetter C. Drone networks: communications, coordination, and sensing. Ad Hoc Networks 2018; 68: 1-15. doi: 10.1016/j.adhoc.2017.09.001
[7] Scherer J, Rinner B. Persistent multi-uav surveillance with energy and communication constraints. In: IEEE 2016 International Conference on Automation Science and Engineering (CASE); Fort Worth, TX, USA; 2016. pp. 1225- 1230.
[8] Liu X, Liu Y, Chen Y, Hanzo L. Trajectory design and power control for multi-UAV assisted wireless networks: a machine learning approach. IEEE Transactions on Vehicular Technology 2019; 68 (8): 7957-7969. doi: 10.1109/TVT.2019.2920284
[9] Zhang B, Liu CH, Tang J, Xu Z, Ma J et al. Learning-based energy-efficient data collection by unmanned vehicles in smart cities. IEEE Transactions on Industrial Informatics 2018; 14 (4): 1666-1676. doi: 10.1109/TII.2017.2783439
[10] Deng L, Yuan H, Huang L, Yan S, Lai Y. Post-earthquake search via an autonomous UAV: hybrid algorithm and 3d path planning. In: IEEE 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD); Huangshan, China; 2018. pp. 1329-1334.
[11] Cui J, Liu Y, Nallanathan A. The application of multi-agent reinforcement learning in UAV networks. In: IEEE International Conference on Communications Workshops (ICC Workshops); Shanghai, China; 2019. pp. 1-6.
[12] Cui J, Liu Y, Nallanathan A. Multi-agent reinforcement learning-based resource allocation for UAV networks. IEEE Transactions on Wireless Communications 2020; 19 (2): 729-743. doi: 10.1109/TWC.2019.2935201
[13] Chen YJ, Chang DK, Zhang C. Autonomous tracking using a swarm of UAVs: a constrained multiagent reinforcement learning approach. IEEE Transactions on Vehicular Technology 2020; 69 (11): 13702-13717. doi: 10.1109/TVT.2020.3023733
[14] Cui J, Liu Y, Nallanathan A, Hanzo L. Adaptive UAV-trajectory optimization under quality of service constraints: a model-free solution. IEEE Access 2020; 8: 112253-112265. doi: 10.1109/ACCESS.2020.3001752
[15] Liu X, Liu Y, Chen Y, Wang L, Lu Z. Machine learning aided trajectory design and power control of multi-UAV. In: IEEE Global Communications Conference (GLOBECOM’19); Waikoloa, HI, USA; December 2019. pp. 1-6.
[16] Liu X, Liu Y, Chen Y. Reinforcement learning in multiple-UAV networks: deployment and movement design. IEEE Transactions on Vehicular Technology 2019; 68 (8): 8036-8049. doi: 10.1109/TVT.2019.2922849
[17] Liu X, Liu Y, Chen Y. Deployment and movement for multiple aerial base stations by reinforcement learning. In: IEEE 2018 Globecom Workshops (GC Wkshps); Abu Dhabi, United Arab Emirates; 2018. pp. 1-6.
[18] Hu J, Zhang H, Song L. Reinforcement learning for decentralized trajectory design in cellular UAV networks with sense-and-send protocol. IEEE Internet of Things Journal 2019; 6 (4): 6177-6189. doi: 10.1109/JIOT.2018.2876513.
[19] Chowdhury MMU, Erden F, Guvenc I. RSS-based Q-learning for indoor UAV navigation. In: IEEE Military Communications Conference (MILCOM’19); Norfolk, VA, USA; November 2019. pp. 121-126.
[20] Langford M, Unwin DJ. Generating and mapping population density surfaces within a geographical information system. The Cartographic Journal 1994; 31 (1): 21-26. doi: 10.1179/000870494787073718
[21] Tenerelli P, Gallego JF, Ehrlich D. Population density modelling in support of disaster risk assessment. International Journal of Disaster Risk Reduction 2015; 13: 334-341. doi: 10.1016/j.ijdrr.2015.07.015
[22] Watkins CJCH. Learning from delayed rewards. PhD, University of Cambridge, Cambridge, England, 1989.
[23] Kusy M, Zajdel R. Stateless Q-learning algorithm for training of radial basis function based neural networks in medical data classification. In: Korbicz J, Kowal M (editors). Intelligent Systems in Technical and Medical Diagnostics. Berlin, Germany: Springer, 2014, pp. 267-278.