Adaptive Control of an Inverted Pendulum by a Reinforcement Learningbased LQR Method

Inverted pendulums constitute one of the popular systems for benchmarking control algorithms. Several methods have been proposed for the control of this system, the majority of which rely on the availability of a mathematical model. However, deriving a mathematical model using physical parameters or system identification techniques requires manual effort. Moreover, the designed controllers may perform poorly if system parameters change. To mitigate these problems, recently, some studies used Reinforcement Learning (RL) based approaches for the control of inverted pendulum systems. Unfortunately, these methods suffer from slow convergence and local minimum problems. Moreover, they may require hyperparameter tuning which complicates the design process significantly. To alleviate these problems, the present study proposes an LQR-based RL method for adaptive balancing control of an inverted pendulum. As shown by numerical experiments, the algorithm stabilizes the system very fast without requiring a mathematical model or extensive hyperparameter tuning. In addition, it can adapt to parametric changes online.

Keywords:

Reinforcement learning, LQR, Inverted pendulum, Q-learning adaptive control,

PDF

___

[1] O. Boubaker, “The Inverted Pendulum Benchmark in Nonlinear Control Theory: A Survey,” International Journal of Advanced Robotic Systems, vol. 10, no. 5, p. 233, 2013.
[2] A. Jose, C. Augustine, S. M. Malola, K. Chacko, “Performance Study of PID Controller and LQR Technique for Inverted Pendulum,” World Journal of Engineering and Technology, vol. 03, no. 02, 2015.
[3] L. B. Prasad, B. Tyagi, H. O. Gupta, “Optimal Control of Nonlinear Inverted Pendulum System Using PID Controller and LQR: Performance Analysis Without and With Disturbance Input,” International Journal of Automation and Computing, vol. 11, no. 6, pp. 661–670, 2014.
[4] M. K. Habib, S. A. Ayankoso, “Hybrid Control of a Double Linear Inverted Pendulum using LQR-Fuzzy and LQRPID Controllers,” in 2022 IEEE International Conference on Mechatronics and Automation (ICMA), August 2022, pp. 1784–1789.
[5] S. Coşkun, “Non-linear Control of Inverted Pendulum,” Çukurova University Journal of the Faculty of Engineering and Architecture, vol. 35, no. 1, 2020.
[6] J. Yi, N. Yubazaki, K. Hirota, “Upswing and stabilization control of inverted pendulum system based on the SIRMs dynamically connected fuzzy inference model,” Fuzzy Sets and Systems, vol. 122, no. 1, pp. 139–152, 2001.
[7] A. Mills, A. Wills, B. Ninness, “Nonlinear model predictive control of an inverted pendulum,” in 2009 American Control Conference, June 2009, pp. 2335–2340.
[8] B. Liu, J. Hong, L. Wang, “Linear inverted pendulum control based on improved ADRC,” Systems Science & Control Engineering, vol. 7, no. 3, pp. 1–12, 2019.
[9] A. Tiga, C. Ghorbel, N. Benhadj Braiek, “Nonlinear/Linear Switched Control of Inverted Pendulum System: Stability Analysis and Real-Time Implementation,” Mathematical Problems in Engineering, vol. 2019, p. e2391587, 2019.
[10] N. P. K. Reddy, D. M. S. Kumar, D. S. Rao, “Control of Nonlinear Inverted Pendulum System using PID and Fast Output Sampling Based Discrete Sliding Mode Controller,” International Journal of Engineering Research, vol. 3, no. 10, 2014.
[11] A. Bonarini, C. Caccia, A. Lazaric, M. Restelli, “Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot,” in Artificial Intelligence in Theory and Practice II, M. Bramer, Ed., in IFIP – The International Federation for Information Processing. Boston, MA: Springer US, 2008, pp. 151–160.
[12] S. Nagendra, N. Podila, R. Ugarakhod, K. George, “Comparison of reinforcement learning algorithms applied to the cart-pole problem,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Sep. 2017, pp. 26–32.
[13] T. Peng, H. Peng, F. Liu, “Guided Deep Reinforcement Learning based on RBFARX Pseudo LQR in Single Stage Inverted Pendulum,” in 2022 International Conference on Intelligent Systems and Computational
[14] D. Bates, “A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum.” arXiv, Feb. 06, 2021. Accessed: Mar. 21, 2023. [Online]. Available: http://arxiv.org/abs/2102.08362
[15] A. Surriani, O. Wahyunggoro, A. I. Cahyadi, “Reinforcement Learning for Cart Pole Inverted Pendulum System,” in 2021 IEEE Industrial Electronics and Applications Conference (IEACon), Nov. 2021, pp. 297–301.
[16] C. A. Manrique Escobar, C. M. Pappalardo, D. Guida, “A Parametric Study of a Deep Reinforcement Learning Control System Applied to the Swing-Up Problem of the Cart-Pole,” Applied Sciences, vol. 10, no. 24, Art. no. 24, 2020.
[17] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, F. L. Lewis, “Optimal and Autonomous Control Using Reinforcement Learning: A Survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018.
[18] S. Bradtke, “Reinforcement Learning Applied to Linear Quadratic Regulation,” in Advances in Neural Information Processing Systems, Morgan-Kaufmann, 1992. Accessed: Mar. 08, 2023. [Online]. Available: https://proceedings.neurips.cc/paper/19 92/hash/19bc916108fc6938f52cb96f7e 087941-Abstract.html
[19] V. G. Lopez, M. Alsalti, M. A. Müller, “Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems,” IEEE Transactions on Automatic Control, pp. 1–12, 2023.
[20] H. Zhang, N. Li, “Data-driven policy iteration algorithm for continuous-time stochastic linear-quadratic optimal control problems.” arXiv, Sep. 28, 2022. Accessed: Mar. 08, 2023. [Online]. Available: http://arxiv.org/abs/2209.14490
[21] Y. Hu, A. Wierman, G. Qu, “On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory.” arXiv, Feb. 14, 2022. Accessed: Mar. 08, 2023. [Online]. Available: http://arxiv.org/abs/2202.07187
[22] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. Third edition, John Wiley & Sons, 2012.
[23] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Second edition. Cambridge, Mass: A Bradford Book, 1998.
[24] C. De Persis, P. Tesi, “Formulas for Data-Driven Control: Stabilization, Optimality, and Robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, Mar. 2020.