Ceyda Nur ÖZTÜRK, Nihal ALTUNTAŞ, Erkan İMAL, Nahit EMANET

Reinforcement learning-based mobile robot navigation

In recent decades, reinforcement learning (RL) has been widely used in different research fields ranging from psychology to computer science. The unfeasibility of sampling all possibilities for continuous-state problems and the absence of an explicit teacher make RL algorithms preferable for supervised learning in the machine learning area, as the optimal control problem has become a popular subject of research. In this study, a system is proposed to solve mobile robot navigation by opting for the most popular two RL algorithms, Sarsa(λ) and Q(λ). The proposed system, developed in MATLAB, uses state and action sets, defined in a novel way, to increase performance. The system can guide the mobile robot to a desired goal by avoiding obstacles with a high success rate in both simulated and real environments. Additionally, it is possible to observe the effects of the initial parameters used by the RL methods, e.g., λ, on learning, and also to make comparisons between the performances of Sarsa(λ) and Q(λ) algorithms.

PDF

___

[1] Sebag M. A tour of machine learning: an AI perspective. AI Commun 2014; 27: 11-23.
[2] Bellman RE. A Markov decision process. J Math Mech 1957; 6: 679-684.
[3] Watkins CJCH. Learning from delayed rewards. PhD, Cambridge University, Cambridge, UK, 1989.
[4] Rummery GA, Niranjan M. On-line Q-learning Using Connectionist Systems. Cambridge, UK: Cambridge University Engineering Department, 1994.
[5] Muhammad J, Bucak IO. An improved Q-learning algorithm for an autonomous mobile robot navigation problem. In: IEEE 2013 International Conference on Technological Advances in Electrical, Electronics and Computer Engineering; 911 May 2013; Konya, Turkey. New York, NY, USA: IEEE. pp. 239243.
[6] Yun SC, Parasuraman S, Ganapathy V. Mobile robot navigation: neural Q-learning. Adv Comput Inf 2013; 178: 259-268.
[7] Hwang KS, Lo CY. Policy improvement by a model-free dyna architecture. IEEE T Neural Networ 2013; 24: 776-788.
[8] Fard SMH, Hamzeh A, Hashemi S. Using reinforcement learning to find an optimal set of features. Comput Math Appl 2013; 66: 1892-1904.
[9] Zuo L, Xu X, Liu CM, Huang ZH. A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots. Neural Comput Appl 2013; 23: 1873-1883.
[10] Konar A, Chakraborty IG, Singh SJ, Jain LC, Nagar AK. A deterministic improved Q-learning for path planning of a mobile robot. IEEE T Syst Man Cy A 2013; 43: 1141-1153.
[11] Rolla VG, Curado M. A reinforcement learning-based routing for delay tolerant networks. Eng Appl Artif Intel 2013; 26: 2243-2250.
[12] Geramifard A, Redding J, How JP. Intelligent cooperative control architecture: a framework for performance improvement using safe learning. J Intell Robot Syst 2013; 72: 83-103.
[13] Maravall D, de Lope J, Dom´ınguez R. Coordination of communication in robot teams by reinforcement learning. Robot Auton Syst 2013; 61: 661-666.
[14] Mart´ın JA, de Lope J. Ex _____ α >: an effective algorithm for continuous actions reinforcement learning problems. In: IEEE 2009 35th Annual Conference on Industrial Electronics; 35 November 2009; Porto, Portugal. New York, NY, USA: IEEE. pp. 2063-2068.
[15] Khriji L, Touati F, Benhmed K, Al-Yahmedi A. Mobile robot navigation based on Q-learning technique. Int J Adv Robot Syst 2011; 8: 45-51.
[16] McCallum RA. Instance-based state identification for reinforcement learning. Adv Neural In 1995; 8: 377-384.
[17] Zhumatiy V, Gomez F, Hutter M, Schmidhuber J. Metric state space reinforcement learning for a vision-capable mobile robot. In: Arai T, Pfeifer R, Balch T, Yokoi H, editors. Intelligent Autonomous Systems 9. Amsterdam, the Netherlands : IOS Press, 2006. pp. 272-282.
[18] Maˇcek K, Petrovi´c I, Peri´c N. A reinforcement learning approach to obstacle avoidance of mobile robots. In: IEEE 2002 7th International Workshop on Advanced Motion Control; 35 June 2002; Maribor, Slovenia. New York, NY, USA: IEEE. pp. 462-466.
[19] Altunta¸s N, ˙Imal E, Emanet N, Ozt¨urk CN. Reinforcement learning based mobile robot navigation. In: ISCSE 2013 ¨ 3rd International Symposium on Computing in Science and Engineering; 2425 October 2013; Ku¸sadası, Turkey. ˙Izmir, Turkey: Gediz University Publications. pp. 285-289. [20] Sutton RS, Barto AG. Reinforcement Learning: an Introduction. Cambridge, MA, USA: MIT Press, 2005.
[21] Russell SJ, Norvig T. Artificial Intelligence: A Modern Approach. 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall, 2003.
[22] Xu X, Lian CQ, Zuo L, He HB. Kernel-based approximate dynamic programming for real-time online learning control: an experimental study. IEEE T Contr Syst T 2014; 22: 146-156.
[23] Ni Z, He HB, Wen JY, Xu X. Goal representation heuristic dynamic programming on maze navigation. IEEE T Neural Networ 2013; 24: 2038-2050.
[24] Hwang KS, Jiang WC, Chen YJ. Adaptive model learning based on dyna-Q learning. Cybernet Syst 2013; 44: 641-662.
[25] Bellman RE. Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
[26] Wang YH, Li THS, Lin CJ. Backward Q-learning: the combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intel 2013; 26: 2184-2193.
[27] Chen XG, Gao Y, Fan SG. Temporal difference learning with piecewise linear basis. Chinese J Electron 2014; 23: 49-54.
[28] Chen XG, Gao Y, Wang, RL. Online selective kernel-based temporal difference learning. IEEE T Neural Networ 2013; 24: 1944-1956.
[29] Kober J, Bagnell JA, Peters J. Reinforcement learning in robotics: a survey. Int J Robot Res 2013; 32: 1238-1274.
[30] Lopez-Guede JM, Fernandez-Gauna, B, Gra na M. State-action value function modeled by ELM in reinforcement learning for hose control problems. Int J Uncertain Fuzz 2013; 21: 99-116.
[31] Miljkovi´c Z, Miti´c M, Lazarevi´c M, Babi´c B. Neural network reinforcement learning for visual control of robot manipulators. Expert Syst Appl 2013; 40: 1721-1736.