Derin Takviyeli Öğrenme Tabanlı Bilinmeyen Ortamda Çoklu Robot Navigasyonu

Günümüzde mobil robotların navigasyon problemini derin takviyeli öğrenme (DRL) ile çözmeye çalışmak ilgi çekici konulardan birisi haline gelmiştir. Tekli mobil robotlarda DRL yüksek başarı oranlarına ulaşmıştır. Çoklu robot sistemlerinde ise, problemin karmaşıklığı üstel bir şekilde arttığı için maliyeti yüksek ve daha zorlu iş haline gelmektedir. Bu çalışmada ise DRL ile çoklu robot navigasyonu problemine çözüm getirilmeye çalışılmıştır. Önerilen yaklaşımdaki sistemde eşzamanlı bir ortam, bu ortamda birden fazla robot, hedef, engel bulunmaktadır. Ortamda robotlar sırasıyla eylem seçerek, hareket ederler. Aynı zamanda robotlar kendilerinden başka robotlar için dinamik bir engel işlevi görmektedir. Robotlar kendi hedeflerine en kısa yoldan herhangi bir çarpışma yaşamadan ulaşmaya çalışırlar. Aynı zamanda robotlar bir başka robotla çarpışmayacak şekilde veya bir başka robotun rotasını uzatmayacak şekilde yol planlaması yapmaya çalışırlar. Bunları sağlayabilmek için çok ajanlı deep q-network (DQN) algoritması, hedefe yönelik bir durum verisi, güçlendirilmiş adaptif ödül mekanizması kullanılmıştır. Önerilen yaklaşım doğrultusunda oluşturulan sistem tek bir robotun navigasyon başarısı, çoklu robot sisteminin navigasyon başarısı, birim-kare başına düşen robot sayısına göre başarı oranı olarak değerlendirilmiştir. Bu değerlendirmeler önerilen yaklaşımın performansını doğrulamıştır.

Multi-Robot Navigation in Unknown Environment Based on Deep Reinforcement Learning

To solve multi-robot navigation with traditional methods, many algorithms can be used in combination, both for navigation and for the cooperation of these robots. These traditional methods using multiple algorithms are costly. Deep reinforcement learning (DRL) is simpler and less costly when compared to traditional methods. Nowadays, it is tried to solve real-world problems with DRL for these reasons. In this study, it has been tried to solve the multi-robot navigation problem with DRL. In the system in the proposed approach, there is a synchronous environment and more than one robot, target and obstacle in this environment. The robots in the environment move by selecting an action, respectively. At the same time, the robots as a dynamic obstacle for other robots. The robots try to reach their targets in the shortest path without any collision. At the same time, the robots try to plan paths so that they do not collide with another robot or extend the path of another robot. In order to provide these, multi-agent DQN algorithms, target-oriented state data, and reinforced adaptive reward mechanism were used. The system in the proposed approach was evaluated as the navigation success of a single robot, the navigation success of the multi-robot system, and the success rate according to the number of robots per unit square.

___

  • J. Yu and S. M. LaValle, “Optimal Multirobot path planning on graphs: Complete algorithms and effective heuristics,” IEEE Transactions on Robotics, vol. 32, no. 5, pp. 1163–1177, 2016.
  • N. F. Bar, H. Yetis, and M. Karakose, “Deep Reinforcement Learning Approach with adaptive reward system for robot navigation in Dynamic Environments,” Interdisciplinary Research in Technology and Management, pp. 349–355, 2021.
  • M. Pfeiffer et al., “Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Map-less Navigation by Leveraging Prior Demonstrations.” arXiv, 2018. doi: 10.48550/ARXIV.1805.07095.
  • T. Xuan Tung and T. Dung Ngo, "Socially Aware Robot Navigation Using Deep Reinforcement Learning," 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), 2018, pp. 1-5, doi: 10.1109/CCECE.2018.8447854.
  • S. -H. Han, H. -J. Choi, P. Benz and J. Loaiciga, "Sensor-Based Mobile Robot Navigation via Deep Reinforcement Learning," 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), 2018, pp. 147-154, doi: 10.1109/BigComp.2018.00030.
  • X. Qiu, K. Wan and F. Li, "Autonomous Robot Navigation in Dynamic Environment Using Deep Reinforcement Learning," 2019 IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), 2019, pp. 338-342, doi: 10.1109/AUTEEE48671.2019.9033166.
  • H. Surmann, C. Jestel, R. Marchel, F. Musberg, H. Elhadj, and M. Ardani, “Deep Reinforcement learning for real autonomous mobile robot navigation in indoor environments.” arXiv, 2020. doi: 10.48550/ARXIV.2005.13857.
  • V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540. Springer Science and Business Media LLC, pp. 529–533, Feb. 25, 2015. doi: 10.1038/nature14236.
  • E. A. Oury Diallo and T. Sugawara, “Multi-Agent Pattern Formation: a Distributed Model-Free Deep Reinforcement Learning Approach, ” 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1-8.
  • R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018.
  • V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, “An Introduction to Deep Reinforcement Learning,” Foundations and Trends® in Machine Learning, vol. 11, no. 3–4. Now Publishers, pp. 219–354, 2018. doi: 10.1561/2200000071.
  • O. Anschel, N. Baram, and N. Shimkin, “Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning.” arXiv, 2016. doi: 10.48550/ARXIV.1611.01929.
  • M. Tokic, “Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences,” KI 2010: Advances in Artificial Intelligence. Springer Berlin Heidelberg, pp. 203–210, 2010. doi: 10.1007/978-3-642-16111-7_23.
  • L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, no. 3–4. Springer Science and Business Media LLC, pp. 293–321, May 1992. doi: 10.1007/bf00992699.
  • T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized Experience Replay.” arXiv, 2015. doi: 10.48550/ARXIV.1511.05952.
  • Z. Tan and M. Karakose, “On-policy deep reinforcement learning approach to multi agent problems,” Interdisciplinary Research in Technology and Management, pp. 369–376, 2021.
  • Z. Tan and M. Karaköse, "Proximal Policy Based Deep Reinforcement Learning Approach for Swarm Robots," 2021 Zooming Innovation in Consumer Technologies Conference (ZINC), 2021, pp. 166-170, doi: 10.1109/ZINC52049.2021.9499288.
  • A. Shrestha and A. Mahmood, "Review of Deep Learning Algorithms and Architectures", IEEE Access, vol. 7, pp. 53040-53065, 2019.
  • E. Bochinski, T. Senst and T. Sikora, "Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms," 2017 IEEE International Conference on Image Processing (ICIP), Beijing, 2017, pp. 3924-3928.
  • J. Brownlee, "Loss and Loss Functions for Training Deep Learning Neural Networks", Machine Learning Mastery, 2019. [Online]. Available: https://machinelearningmastery.com/loss-and-lossfunctions-for-training-deep-learning-neural-networks/. [Accessed: 12-May- 2022].