Çok robotlu görev atama probleminde merkezi Q-öğrenme kullanmak için etkili bir yöntem

Çok robotlu sistemlerde Q-öğrenme yönteminin kullanımı oldukça problemlidir. Çok robotlu sistemlerde, robotun bağımsız karar verme ve hareket etme mekanizmaları nedeniyle dinamik ve kısmen gözlemlenebilir yapıya sahiptir. Oysa, Q-öğrenme yöntemi teorik olarak Markovian olarak nitelendirilebilecek ortamlar üzerinde tanımlanmıştır. Çok robotlu sistemlerde Q-öğrenmeyi uygulamanın bir yolu, merkezi öğrenmedir. Merkezi öğrenme, tüm sistemin durum uzayı ve tüm robotların tümleşik hareket uzayları için optimal Q-değerlerini öğrenir. Bu durumda, sistem statik olarak değerlendirilmekte ve optimal çözüm yakınsama mümkün olmaktadır. Ancak, merkezi öğrenme, çevre hakkında tam bilgi edinmeyi, robotlar arası iyi bir haberleşme sağlanmasını ve iyi hesaplama gücü gerektirir. Özellikle büyük sistemler için, robot sayısındaki artışla birlikte üstel büyüyen öğrenme uzayı boyutu nedeniyle hesaplama maliyeti çok yüksek olmaktadır. Bu çalışmada önerilen yaklaşım olan subG-CQL, sistemin görev yapma yeteneklerini olumsuz yönde etkilemeden genel sistemi küçük boyutlu alt gruplara ayırır. Her bir alt grup daha az sayıda robottan oluşur, daha az görev yapar ve kendi ekibi için merkezi bir şekilde öğrenir. Böylece öğrenme alanı boyutu makul bir düzeye indirilir ve gerekli iletişim aynı alt gruptaki robotlarla sınırlı kalır. Merkezi öğrenmenin kullanılması nedeniyle başarılı sonuçlara ulaşılması beklenmektedir. Deneysel çalışmalar, önerilen algoritmanın sistemin görev atama performansında artış ve sistem kaynaklarının verimli kullanımını sağladığını göstermektedir.

An effective method to use centralized Q-learning in multi-robot task allocation

The use of Q-learning methods in multi-robot systems is a challenging area. Multi-robot systems have dynamic and partially observable nature because of robot’s independent decision-making and acting mechanisms. Whereas, Q-learning is defined on Markovian environments theoretically. One way to apply Q-learning in multi robot systems is centralized learning. It learns optimal Q-values for state space of overall system and joint action spaces of all agents. In this case, the system can be considered as stationary and optimal solutions can be converged. But, centralized learning requires full knowledge of the environment, perfect inter-robot communication and good computational power. Especially for large systems, the computational cost becomes huge because of exponentially growing learning space size with the number of robots. The proposed approach in this study, subG-CQL, divides the overall system into small-sized sub-groups without adversely affecting the system's task performing abilities. Each sub-group consists of less number of robots performing less tasks and learns in centralized manner for its own team. So, the learning space dimension is reduced to a reasonable level and required communication remains limited to the robots in the same the sub-group. Due the centralized learning is used, it is expected that the successful results are achieved. Experimental studies show that the proposed algorithm provides increase in the task assignment performance of the system and efficient use of system resources.

___

  • [1] Arkin RC., Behavior-Based Robotics. Massachusetts, USA, MIT Press, 1998.
  • [2] Dai W, Lu H, Xiao J, Zeng Z, Zheng Z. “Multi-robot dynamic task allocation for exploration and destruction”. Journal of Intelligent & Robotics Systems, 98, 455-479, 2020.
  • [3] Bernstein DS, Givan R, Immerman N, Zilberstein S. “The complexity of decentralized control of Markov decision processes”. Mathematics of Operation
  • [4] Mataric MJ. “Reinforcement learning in multi-robot domain”, Autonomous Robots, 4(1), 73-83, 1997.
  • [5] Gerkey BP, Mataric MJ. “A formal analysis and taxonomy of task allocation in multi-robot systems”. The International Journal of Robotics Research, 23(9), 939-954, 2004.
  • [6] Gerkey BP, Mataric MJ. “Sold!: auction methods for multi robot coordination”. IEEE Transactions on Robotics and Automation, 18(5), 758-768, 2002.
  • [7] Dias MB, Zlot RM, Kaltra N, Stentz A. “Market-based multirobot coordination: a survey and analysis”. in Proceedings of the IEEE, 94(7), 1257-1270, 2006.
  • [8] Zlot R, Stentz A, Dias MB, Thayer S. “Multi-robot exploration controlled by a market economy”. IEEE International Conference on Robotics and Automation, Washington DC, USA, 11-15 May 2002.
  • [9] Lagoudakis MG, Markakis E, Kempe D, Keskinocak P, Kleywegt AJ, Koenig S, Tovey CA, Meyerson A, Jain S. “Auction-based multi-robot routing”. Robotics: Science & Systems, Massachusetts, USA, 8-10 June 2005.
  • [10] Mosteo AR, Montano L. “Comparative experiments on optimization criteria and algorithms for auction based multi-robot task allocation”. IEEE International Conference on Robotics and Automation, Roma, Italy, 10-14 April 2007.
  • [11] Zlot R, Stentz A. “Market-based multirobot coordination for complex tasks”. International Journal of Robotics Research Special Issue on the 4th International Conference on Field and Service Robotics. 25(1), 73-101, 2006.
  • [12] Hanna H. “Decentralized approach for multi-robot task allocation problem with uncertain task execution”. IEEE/RSJ International Conference on Intelligent Robots and Systems, Alberta, Canada, 2-6 August 2005.
  • [13] Ezercan Kayir HH, Parlaktuna O. “Strategy planned Q-learning approach for multi-robot task allocation”. Proc. 11th International Conference on Informatics in Control, Automation and Robotics, Vienna, Austria, 1-3 September 2014.
  • [14] Ezercan Kayir HH. “Experienced-task based multi robot task allocation”. Anadolu University of Sciences & Technology-A: Applied Sciences & Engineering, 18(4), 864-875, 2017.
  • [15] Ezercan Kayir HH. “Q-Learning Based Failure Detection and Self-Self-Recovery Algorithm for Multi Multi-Robot Domains”. Elektronika Ir Elektrotechnika, 25(1), 3-7, 2019.
  • [16] Busquets D, Simmons R. “Learning when to auction and when to bid”. Distributed Autonomous Robotic Systems, 7, 21-30, 2006.
  • [17] Farinelli A, Iocchi L, Nardi D. “Distributed on-line dynamic task assignment for multi-robot patrolling”. Autonomous Robots, 41(6), 1321-1345, 2017.
  • [18] Scheider J, Apfelbaum J, Bagnell D, Simmons R. “Learning opportunity costs in multi-robot market based planners”. International Conference on Robotics and Automation, Barcelona, Spain, 18-22 April 2005.
  • [19] Jones EG, Dias MB, Stentz A. “Learning-enhanced market-based task allocation for oversubscribed domains”. IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, 29 October-2 November 2007.
  • [20] Tian YT, Yang M, Qi XY, Yang YM. “Multi-robot task allocation for fire-disaster response based on reinforcement learning”. Eighth International Conference on Machine Learning and Cybernetics, Baoding, China, 12-15 July 2009.
  • [21] Yang E, Gu D. “Multiagent reinforcement learning for multi-robot systems: a survey”, Department of Computer Sciences, University of Essex, UK, Technical Report, 2004.
  • [22] Matignon L, Laurent GJ, Le Fort-Piat N. “Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams”. IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, 29 October-2 November 2007.
  • [23] Buşoniu L, Babuška R, Schutter B. “A comprehensive survey of multiagent reinforcement learning”. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, 38(2), 156-172, 2008.
  • [24] Russel S, Norvig P. Artificial Intelligence a Modern Approach. 2nd ed. New Jersey, USA, Prentice Hall, 2003.
  • [25] Tuyls K, Nowè A. “Evolutionary game theory and multi-agent reinforcement learning”. The Knowledge Engineering Review, 20(1), 63-90, 2005.
  • [26] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Massachusetts, USA, MIT Press, 1998.
  • [27] Watkins CJ, Dayan P. “Q-learning”. Machine Learning, 8, 279-292, 1992.
  • [28] Hu J, Wellman MP. “Nash Q-learning for general sum games”. Journal of Machine Learning Research, 4, 1039-1069, 2003.
  • [29] Boutlier C. “Planning learning and coordination in multiagent decision processes”. 6th Conference on Theoretical Aspects of Rationality and Knowledge, TARK'96, Renesse, The Netherlands, 17-20 March 1996.
  • [30] Wang Y, de Silva CW. “Extend single-agent reinforcement learning approach to a multi-robot cooperative task in an unknown dynamic environment”. IEEE International Joint Conference on Neural Networks, Vancouver, Canada, 16-21 July 2006.
  • [31] Martinson E, Arkin RC. “Learning to role-switch in multi-robot systems”. IEEE International Conference on Robotics and Automation, Taibei, Taiwan, 14-19 September 2003.
  • [32] Tan M. “Multi-agent reinforcement learning: Independent vs. cooperative agents”. Tenth international Conference on Machine Learning, Massachusetts, USA, 27-29 June 1993.
  • [33] Matignon L, Laurent GJ, Le Fort-Piat N. “Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems”. The Knowledge Engineering Review, 27(01), 1-31, 2012.
  • [34] Tesauro G. “Extending q-learning to general adaptive multi-agent systems”. 16th International Conference on Neural Information Processing Systems, Vancouver, Canada, 8-13 December 2003.
  • [35] Kok JR, Vlassis N. “Collaborative multiagent reinforcement learning by payoff propagation”. Journal of Machine Learning Research, 7, 1789-1828, 2006.
  • [36] Park KH, Kim YJ, Kim JH. “Modular Q-learning based multi-agent cooperation for robot soccer”. Robotics and Autonomous Systems, 35, 109-122, 2001.
  • [37] Wang Y, de Silva CW. “A machine-learning approach to multi-robot coordination”. Engineering Applications of Artificial Intelligence, 21(3), 470-488, 2008.
  • [38] Oliehoek FA, Amato C. A Concise Introduction to Decentralized POMDPs. Switzerland, Springer, 2016.
  • [39] Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T. “Value decomposition networks for cooperative multi-agent learning based on team reward”. 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweeden, 10-15 July 2018.
  • [40] Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. “QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning”. Journal of Machine Learning Research, 21, 1-51, 2020.
  • [41] Son K, Kim D, Kang WJ, Hostallero D, Yi Y. “QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement learning”. 36th International Conference on Machine Learning, Long Beach, California, USA, 9-15 June 2019.