Elif SÜRER, Faruk KÜÇÜKSUBAŞI

Relational-grid-world: a novel relational reasoning environment and an agent model for relational information extraction

Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms of generalizability and interpretability using symbolic artificial intelligence (AI) tools such as logic programming. In this study, we present a model-free RL architecture that is supported with explicit relational representations of the environmental objects. For the first time, we use the PrediNet network architecture in a dynamic decision-making problem rather than image-based tasks, and multi-head dot-product attention network (MHDPA) as a baseline for performance comparisons. We tested two networks in two environments —i.e., the baseline box-world environment and our novel environment, relational-grid-world (RGW). With the procedurally generated RGW environment, which is complex in terms of visual perceptions and combinatorial selections, it is easy to measure the relational representation performance of the RL agents. The experiments were carried out using different configurations of the environment so that the presented module and the environment were compared with the baselines. We reached similar policy optimization performance results with the PrediNet architecture and MHDPA. Additionally, we achieved to extract the propositional representation explicitly —which makes the agent’s statistical policy logic more interpretable and tractable. This flexibility in the agent’s policy provides convenience for designing non-task-specific agent architectures. The main contributions of this study are two-fold —an RL agent that can explicitly perform relational reasoning, and a new environment that measures the relational reasoning capabilities of RL agents

PDF

___

[1] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. USA: MIT Press, 2018.
[2] Watkins CJCH, Dayan P. Q-learning. Machine Learning 1992; 8 (3-4): 279-292.
[3] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. Human-level control through deep reinforcement learning. Nature 2015; 518 (7540): 529-533.
[4] Garnelo M, Shanahan M. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences 2019; 29: 17-23.
[5] Pearl J, Mackenzie D. The book of why: the new science of cause and effect; Basic Books, 2018.
6] Zambaldi V, Raposo D, Santoro A, Bapst V, Li Y et al. Deep reinforcement learning with relational inductive biases. In: International Conference on Learning Representations; Vancouver, Canada, 2018.
[7] Raposo D, Santoro A , Barrett D, Pascanu R, Lillicrap T et al. Discovering objects and their relations from entangled scene representations. arXiv preprint 2017. arXiv:1702.05068.
[8] Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R et al. A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems 2017; 4967-4976.
[9] Shanahan M, Nikiforou K, Creswell A, Kaplanis C, Barrett D et al. An explicitly relational neural network architec- ture. In: International conference on machine learning; Virtual Site, 2020.
[10] Coulom R. Eﬀicient selectivity and backup operators in Monte-Carlo tree search. In: International conference on computers and games; Turin, Italy, 2006.
[11] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In: International Conference on Learning Representations; San Juan, Puerto Rico; 2016. pp. 1-20.
[12] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint 2017. arXiv:1707.06347.
[13] Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems 2000; 1008-1014.
[14] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning 2016; 1928-1937.
[15] Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V et al. Impala: Scalable distributed deep-rl with importance weighted actor learner architectures.arXiv preprint 2018. arXiv:1802.01561.
[16] Kaelbling LP. Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of the tenth international conference on machine learning 1993; 951.
[17] Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. Go-explore: a new approach for hard-exploration problems. arXiv preprint 2019. arXiv:1901.10995.
[18] Pathak D, Agrawal P, Efros AA, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2017; Honolulu, HI, USA; pp. 16-17.
[19] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. Attention is all you need. In: Advances in neural information processing systems 2017; 5998-6008.
[20] Džeroski S, Raedt LD, Driessens K. Relational reinforcement learning. Machine learning 2001; 43 (1-2): 7-52.
[21] Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M et al. Relational recurrent neural networks. In: Advances in neural information processing systems 2018; 7299-7310