Actor-critic-based ink drop spread as an intelligent controller

This paper introduces an innovative adaptive controller based on the actor-critic method. The proposed approach employs the ink drop spread (IDS) method as its main engine. The IDS method is a new trend in soft-computing approaches that is a universal fuzzy modeling technique and has been also used as a supervised controller. Its process is very similar to the processing system of the human brain. The proposed actor-critic method uses an IDS structure as an actor and a 2-dimensional plane, representing control variable states, as a critic that estimates the lifetime goodness of each state. This method is fast, simple, and away from mathematical complexity. The proposed method uses the temporal differences (TD) method to update both the actor and the critic. Our system: 1) learns to produce real-valued control actions in a continuous space without regarding the Markov decision process, 2) can adaptively improve performance during the lifetime, and 3) can scale well to high-dimensional problems. To show the effectiveness of the method, we conduct experiments on 3 systems: an inverted pendulum, a ball and beam, and a 2-wheel balancing robot. In each of these systems, the method converges to a pertinent fuzzy system with a significant improvement in terms of the rise time and overshoot compared to other fuzzy controllers.

Actor-critic-based ink drop spread as an intelligent controller

This paper introduces an innovative adaptive controller based on the actor-critic method. The proposed approach employs the ink drop spread (IDS) method as its main engine. The IDS method is a new trend in soft-computing approaches that is a universal fuzzy modeling technique and has been also used as a supervised controller. Its process is very similar to the processing system of the human brain. The proposed actor-critic method uses an IDS structure as an actor and a 2-dimensional plane, representing control variable states, as a critic that estimates the lifetime goodness of each state. This method is fast, simple, and away from mathematical complexity. The proposed method uses the temporal differences (TD) method to update both the actor and the critic. Our system: 1) learns to produce real-valued control actions in a continuous space without regarding the Markov decision process, 2) can adaptively improve performance during the lifetime, and 3) can scale well to high-dimensional problems. To show the effectiveness of the method, we conduct experiments on 3 systems: an inverted pendulum, a ball and beam, and a 2-wheel balancing robot. In each of these systems, the method converges to a pertinent fuzzy system with a significant improvement in terms of the rise time and overshoot compared to other fuzzy controllers.

___

  • G. Feng, “A survey on analysis and design of model-based fuzzy control systems”, IEEE Transactions on Fuzzy Systems, Vol. 14, pp. 676–697, 2006.
  • R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction. Cambridge, MIT Press, 1998.
  • A.G. Barto, R.S. Sutton, C.W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 13, pp. 834–846, 1983.
  • A. Miloudi, A. Al-Radadi, A.D. Draou, “A variable gain PI controller used for speed control of a direct torque neuro fuzzy controlled induction machine drive”, Turkish Journal of Electrical Engineering & Computer Sciences, Vol. 15, pp. 37–49, 2007.
  • S.B. Shouraki, N. Honda, “Outlines of a soft computer for brain simulation”, International Conference on Soft Computing Information/Intelligence Systems, 1998.
  • D. Aha, D. Kibler, M. Albert, “Instance-based learning algorithms”, Journal of Machine Learning, Vol. 6, pp. 37–66, 19 H.T. Shahraiyni, S.B. Shouraki, F. Fell, M. Schaale, J. Fischer, A. Tavakoli, R. Preusker, M. Tajrishy, M. Vatandoust, H. Khodaparast, “Application of the active learning method to the retrieval of pigment from spectral remote sensing reflectance data”, International Journal of Remote Sensing, Vol. 30, pp. 1045–1065, 2009.
  • S.B. Shouraki, N. Honda, “Fuzzy controller design by an active Learning method”, 31st Symposium of Intelligent Control, 1998.
  • S.B. Shouraki, N. Honda, “Hardware simulation of brain simulation process”, 15th Symposium of Fuzzy System, 19 S.A. Shahdi, S.B. Shouraki, “Supervised active learning method as an intelligent linguistic controller and its hardware implementation”, Proceedings of the 2nd IASTEAD International Conference on Artificial Intelligence and Applications, 2002.
  • Y. Sakurai, “A study of the learning control method using PBALM-a nonlinear modeling method”, PhD, The University of Electro-Communications, Tokyo, 2005.
  • M. Murakami, N. Honda, “A study on the modeling ability of the IDS method: a soft computing technique using pattern-based information processing”, International Journal of Approximate Reasoning, Vol. 45, pp. 470–487, 2007. A.G. Barto, S.J. Bradtke, S.P. Singh, “Learning to act using real-time dynamic programming”, Artificial Intelligence, Vol. 72, pp. 81–138, 1995.
  • R.S. Sutton, “Learning to predict by the methods of temporal differences”, Machine Learning, Vol. 3, pp. 9–44, 19 H. Berenji, P. Khedkar, “Learning and tuning fuzzy logic controllers through reinforcement”, IEEE Transaction on Neural Networks, Vol. 3, pp. 724–740, 1992.
  • X.S. Wang, Y.H. Cheng, J.Q. Yi, “A fuzzy actor–critic reinforcement learning network”, Information Sciences, Vol. 177, pp. 3764–3781, 2007.
  • H.V. Hasselt, M.A. Wiering, “Reinforcement learning in continuous action spaces”, Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007.
  • M.A. Wiering, H.V. Hasselt, “Two novel on-policy reinforcement learning algorithms based on TD( λ) -methods”, Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007.
  • F. Grasser, A. D’Arrigo, S. Colombi, A. Rufer, “JOE: A mobile, inverted pendulum”, IEEE Transactions on Industrial Electronics, Vol. 49, pp. 107–114, 2002.
  • J. Hauser, S. Sastry, P. Kokotovix, “Nonlinear control via approximate input-output linearization: the ball and beam example”, IEEE Transaction on Automatic Control, Vol. 37, pp. 392–398, 1992.