Relational-grid-world: a novel relational reasoning environment and an agent model for relational information extraction

Relational-grid-world: a novel relational reasoning environment and an agent model for relational information extraction

Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms of generalizability and interpretability using symbolic artificial intelligence (AI) tools such as logic programming. In this study, we present a model-free RL architecture that is supported with explicit relational representations of the environmental objects. For the first time, we use the PrediNet network architecture in a dynamic decision-making problem rather than image-based tasks, and multi-head dot-product attention network (MHDPA) as a baseline for performance comparisons. We tested two networks in two environments —i.e., the baseline box-world environment and our novel environment, relational-grid-world (RGW). With the procedurally generated RGW environment, which is complex in terms of visual perceptions and combinatorial selections, it is easy to measure the relational representation performance of the RL agents. The experiments were carried out using different configurations of the environment so that the presented module and the environment were compared with the baselines. We reached similar policy optimization performance results with the PrediNet architecture and MHDPA. Additionally, we achieved to extract the propositional representation explicitly —which makes the agent’s statistical policy logic more interpretable and tractable. This flexibility in the agent’s policy provides convenience for designing non-task-specific agent architectures. The main contributions of this study are two-fold —an RL agent that can explicitly perform relational reasoning, and a new environment that measures the relational reasoning capabilities of RL agents

___

  • [1] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. USA: MIT Press, 2018.
  • [2] Watkins CJCH, Dayan P. Q-learning. Machine Learning 1992; 8 (3-4): 279-292.
  • [3] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. Human-level control through deep reinforcement learning. Nature 2015; 518 (7540): 529-533.
  • [4] Garnelo M, Shanahan M. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences 2019; 29: 17-23.
  • [5] Pearl J, Mackenzie D. The book of why: the new science of cause and effect; Basic Books, 2018.
  • 6] Zambaldi V, Raposo D, Santoro A, Bapst V, Li Y et al. Deep reinforcement learning with relational inductive biases. In: International Conference on Learning Representations; Vancouver, Canada, 2018.
  • [7] Raposo D, Santoro A , Barrett D, Pascanu R, Lillicrap T et al. Discovering objects and their relations from entangled scene representations. arXiv preprint 2017. arXiv:1702.05068.
  • [8] Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R et al. A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems 2017; 4967-4976.
  • [9] Shanahan M, Nikiforou K, Creswell A, Kaplanis C, Barrett D et al. An explicitly relational neural network architec- ture. In: International conference on machine learning; Virtual Site, 2020.
  • [10] Coulom R. Efficient selectivity and backup operators in Monte-Carlo tree search. In: International conference on computers and games; Turin, Italy, 2006.
  • [11] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In: International Conference on Learning Representations; San Juan, Puerto Rico; 2016. pp. 1-20.
  • [12] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint 2017. arXiv:1707.06347.
  • [13] Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems 2000; 1008-1014.
  • [14] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning 2016; 1928-1937.
  • [15] Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V et al. Impala: Scalable distributed deep-rl with importance weighted actor learner architectures.arXiv preprint 2018. arXiv:1802.01561.
  • [16] Kaelbling LP. Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of the tenth international conference on machine learning 1993; 951.
  • [17] Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. Go-explore: a new approach for hard-exploration problems. arXiv preprint 2019. arXiv:1901.10995.
  • [18] Pathak D, Agrawal P, Efros AA, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2017; Honolulu, HI, USA; pp. 16-17.
  • [19] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. Attention is all you need. In: Advances in neural information processing systems 2017; 5998-6008.
  • [20] Džeroski S, Raedt LD, Driessens K. Relational reinforcement learning. Machine learning 2001; 43 (1-2): 7-52.
  • [21] Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M et al. Relational recurrent neural networks. In: Advances in neural information processing systems 2018; 7299-7310
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Efficient hybrid passive method for the detection and localization of copy-move and spliced images

Navneet KAUR, Neeru JINDAL, Kulbir SINGH

Analytical modeling and study on noise characteristics of rotor eccentric SPMSM with unequal magnetic poles structure

Pengpeng XIA, Shenbo YU, Rutong DOU, Fengchen ZHAI

Ensemble learning of multiview CNN models for survival time prediction of brain tumor patients using multimodal MRI scans

Ulus ÇEVİK, Abdela Ahmed MOSSA

SWFT: Subbands wavelet for local features transform descriptor for corneal diseases diagnosis

Samer K. AL-SALIHI, Sezgin AYDIN, Nebras H. GHAEB

A deep neural network classifier for P300 BCI speller based on Cohen’s class time-frequency distribution

Modjtaba ROUHANI, Hamed GHAZIKHANI

Risk-averse optimal bidding strategy for a wind energy portfolio manager including EV parking lots for imbalance mitigation

Alper ÇİÇEK, Ozan ERDİNÇ

Optimal planning DG and BES units in distribution system considering uncertainty of power generation and time-varying load

Ayman AWAD, Mansur KHASANOV, Salah KAMEL, Francisco JURADO

Dual bit control low-power dynamic content addressable memory design for IoT applications

V.V. Satyanarayana SATTI, Sridevi SRIADIBHATLA

A new hybrid genetic algorithm for protein structure prediction on the 2D triangular lattice

Nabil BOUMEDINE, Sadek BOUROUBI

A novel approach of order diminution using time moment concept with Routh array and salp swarm algorithm

AFZAL SIKANDER, NAFEES AHAMAD