Araçların Kritik Aralık Kabul Kararlarının Pekiştirmeli Öğrenmeyle Simülasyonu

Bu çalışma pekiştirmeli öğrenme yöntemini kullanarak araçların kritik aralık kabul kararlarının basit bir T-kavşakta modellemesini sunmaktadır. Önerilen yaklaşım araçların ulaşım ağlarındaki nihai amaçlarının bekleme sürelerini ve risklerini optimize edeceklerini varsayarak basit bir kritik aralık kabul kararını pekiştirmeli öğrenme problemine çevirmektedir. Trafik simülasyon yazılımında sürücüler kararlarının yol açacağı sonuçları bilmeyerek hareket eder, fakat bir çok simülasyon episodu sonrasında eylemlerinin sonuçlarını bekleme süresi ve risk şeklinde öğrenmeye başlarlar. Gerçek bir dönel kavşağın Paramics trafik simülasyon modeli deneysel analizler için kullanılmıştır. Elde edilen sonuçlara göre kullanılan bu simülasyon modeli "Q-learning" öğrenme yöntemi kullanılınca sürücülerin kritik aralık kabul kararlarının doğrulaması kolaylıkla yapılabilmektedir.

SIMULATION OF VEHICLES' GAP ACCEPTANCE DECISIONS USING REINFORCEMENT LEARNING

This paper presents the use of reinforcement learning approach for modeling vehicles' gap acceptance decisions at a stop-controlled intersection. The proposed formulation translates a simple gap acceptance decision into a reinforcement learning problem, assuming that drivers' ultimate objective in a traffic network is to optimize wait-time and safety. Using an off-the-shelf simulation tool, drivers are simulated without any notion of the outcome of their decisions. From multiple episodes of gap acceptance decisions, they learn from the outcome of their actions, i.e., wait-time and safety. A real-world traffic circle simulation network developed in Paramics simulation software is used to conduct experimental analyses. The results show that drivers' gap acceptance behavior in microscopic traffic simulation models can easily be validated with a high level of accuracy using Q-learning reinforcement-learning algorithm.

PDF

___

1. Abdulhai, B. and Kattan, L. (2003) Reinforcement Learning: Introduction to Theory and Potential for Transport Applications, Canadian Journal of Civil Engineering, 30, 981-991. doi: 10.1139/l03-014
2. Abdulahi, B., Pringle, R. and Karakoulas, G. J. 2003. Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering. Vol. 129. No.3. pp. 278-285. doi: 10.1061/(ASCE)0733-947X(2003)129:3(278)
3. Arel, I., Liu, C., Urbanik, T. and Kohls, A. G. (2010) Reinforcement learning based multiagent system for network traffic signal control, IET Intelligent Transportation Systems, 4(2), 128-135. doi: 10.1049/iet-its.2009.0070
4. Ashton, W. D. (1971) Gap acceptance problems at a traffic intersection, Applied Statistics, 20(2), 130-138. doi: 10.2307/2346461
5. Bartin, B., Ozbay, K., Yanmaz-Tuzel, O. and List, G. (2006) Modeling and Simulation of Unconventional Traffic Circles, Transportation Research Journal: Journal of the Transportation Research Board, 1965, 201-209. doi: 10.3141/1965-21
6. Barton, R. R., and Schruben, L. W. (2001) Resampling methods for input modeling, Proceedings of the 2001 Winter Simulation Conference, 1, 372-378. doi: 10.1109/WSC.2001.977303
7. Bazzan, A. L. C., Oliveira, D. and Silva, B. C. (2010) Learning in groups of traffic signals, Engineering Applications of Artificial Intelligence, 23, 560-568. doi: 10.1016/j.engappai.2009.11.009
8. Bingham, E. (2001) Reinforcement learning in neurofuzzy traffic signal control, European Journal of Operation Research, 131, 232-241. doi: 10.1016/S0377-2217(00)00123-5
9. Bombol, K., Koltovska, D. and Veljanovska, K. (2012) Application of reinforcement learning as a tool of adaptive traffic signal control on isolated intersections, IACSIT International Journal of Engineering and Technology, 4(2), 126 -129. doi: 10.7763/IJET.2012.V4.332
10. Bull, L., Sha'Aban, J., Tomlinson, A., Addison, J.D. and Heydecker, B. G. (2004) Towards distributed adaptive control for road traffic junction signals using learning classifier systems, In: Bull, L, (ed.) Applications of Learning Classifier Systems, 279-299. Springer: New York. doi: 10.1007/978-3-540-39925-4
11. Daganzo, C. (1981) Estimation of gap acceptance parameters within and across the population from direct roadside observation, Transportation Research Part B, 15B, 1-15. doi: 10.1016/0191-2615(81)90042-4
12. Dowling, R., Skabardonis, A. and Alexiadis, V. (2004) Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software, FHWA Contract DTFH61-01-C-00181, FHWA.
13. EI-Tantawy, S., Abdulhai, B. and Abdelgawad, H. (2013) Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto, IEEE Transactions on Intelligent Transportation Systems, 14(3), 1140-1150. doi: 10.1109/TITS.2013.2255286
14. Gattis, J. L. and Low, S. (1999) Gap acceptance at atypical stop-controlled intersections, Journal of Transportation Engineering, 123(3), 201-207. doi: 10.1061/(ASCE)0733- 947X(1999)125:3(201)
15. Gelenbe, E. Seref, E. and Xu, Z. (2001) Simulation with learning agents, Proceedings of the IEEE, Vol. 89 (2), 148-157. doi: 10.1109/5.910851
16. Hamed, M. M., Easa, S. M. and Batayneh, R. R. (1977) Disaggregate gap-acceptance model for unsignalized T-intersections, Journal of Transportation Engineering, 123(1), 36-42, doi: 10.1061/(ASCE)0733-947X(1997)123:1(36)
17. Holland, J. H. (1976) Adaptation, In Rosen & Snell (eds) Progress in Theoretical Biology, 4. Plenum.
18. Iyer, S., Ozbay, K. and Bartin, B. (2010) Ex Post Evaluation of Calibrated Simulation Models of Significantly Different Future Systems, Transportation Research Record: Journal of the Transportation Research Board, 2161, 49-56. doi: 10.3141/2161-06
19. Mahmassani, H. and Sheffi, Y. (1981) Using gap acceptance sequences to estimate gap acceptance functions, Transportation Research Part B, 15B, 143-148. doi: 10.1016/0191- 2615(81)90001-1
20. Maze, T. (1981) A probabilistic model of gap acceptance behavior, Transportation Research Record, 795, 8-13.
21. Mitchell, T. M. (1997) Machine Learning, McGraw Hill Higher Education.
22. Moriarty, D. E. and Langley, P. (1998) Learning cooperative lane selection strategies for highways, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, 684-691, July, Madison, Wisconsin, United States.
23. Nagel, K. (2004) Route learning in iterated transportation studies, Human Behaviour and Traffic Networks, 305-318. doi: doi.org/10.1007/978-3-662-07809-9
24. Ozan, C. (2012) Dynamic User Equilibrium Urban Network Design Based on Modified Reinforcement Learning Method" (in Turkish), PhD Thesis. Pamukkale University, Science and Technology Institute, Civil Engineering Department, Transportation Division, Denizli, Turkey.
25. Ozan, C., Ceylan, H. and Haldenbilen, S. (2014) Solving network design problem with dynamic network loading profiles using modified reinforcement learning method, Proceedings of the 16th Meeting of the EURO Working Group on Transportation, Procedia - Social and Behavioral Sciences, 111, 38-47. doi: 10.1016/j.sbspro.2014.01.036
26. Ozan, C., Baskan, O., Haldenbilen, S. and Ceylan, H. (2015) A modified reinforcement learning algorithm for solving coordinated signalized networks, Transportation Research Part C: Emerging Technologies, 54, 40-55. doi: 10.1016/j.trc.2015.03.010.
27. Ozbay, K., Datta, A. and Kachroo, P. (2001) Modeling Route Choice Behavior Using Stochastic Learning Automata, Transportation Research Record, 1752, 38-46. doi: 10.3141/1752-06
28. Ozbay, K., Datta A. and Kachroo, P. (2002) Application of Stochastic Learning Automata for Modeling Departure Time and Route Choice Behavior. Transportation Research Record, 1807, 154-162. doi: 10.3141/1807-19
29. Ozbay, K., Yang, H., Bartin, B. and Mudigonda, S. (2008) Derivation and validation of a new simulation-based surrogate safety measure, Transportation Research Record, 2083, 103-113. doi: 10.3141/2083-12
30. Paramics Website. Access address: http://www.paramics-online.com/ (Accessed on April 7,2017)
31. Pendrith, M. D. (2000) Distributed reinforcement learning for a traffic engineering application, Proceedings of the fourth international conference on Autonomous agents, 404- 411, June 03-07, Barcelona, Spain. doi: 10.1145/336595.337554
32. Pollatschek, M.A., Polus, A. and Livneh, M. (2002) A Decision Model for Gap Acceptance and Capacity at Intersection, Transportation Research Part B, 36, 649-663. doi: 10.1016/S0191-2615(01)00024-8
33. Polus, A., Lazar, S. S. and Livneh, M. (2003) Critical gap as a function of waiting time in determining roundabout capacity, Journal of Transportation Engineering, 129(5), 504-509. doi: 10.1061/(ASCE)0733-947X(2003)129:5(504)
34. Polus, A., Shiftan, Y., and Shmueli-Lazar, S. (2005) Evaluation of the waiting-time effect on critical gaps at roundabouts by a logit model, European Journal of Transport and Infrastructure Research, 5(1), 1-12.
35. Rezaee, K., Abdulahi, B. and Abdelgawad, H. (2012) Application of reinforcement learning with continuous state space to ramp metering in real-world conditions, 15th International IEEE Conference on Intelligent Transportation Systems, Anchorage, Alaska, USA. doi: 10.1109/MITS.2012.2217592.
36. Russell, S. J. and Norvig, P. (2003) Artificial intelligence: A modern approach, Prentice Hall series in artificial intelligence. Upper Saddle River, N.J.: Prentice Hall/Pearson Education.
37. Sacks, J., Rouphail, N. M., Park, B., Thakuriah, P., Rilett, L R., Spiegelman, C. H. and Morris, M. D. (2002) Statistically-Based Validation of Computer Simulation Models in Traffic Operations and Management, Journal of Transportation and Statistics, 5(1), 1-24.
38. Sutton, R. S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
39. Teply, S., Abou-Henaidy, M. and Hunt, J. D. (1997) Gap acceptance behavior - aggregate and logit perspectives: Part 1, Traffic Engineering and Control, 37(9), 474-482.
40. Vanhulsel, M., Janssens, D., Wets, G. and Vanhoof, K. (2009) Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications. 36, 8032-8039. doi: 10.1016/j.eswa.2008.10.056
41. Yanmaz-Tuzel, O. (2010) Modeling traveler behavior via day-to-day learning dynamics, Ph.D. Thesis, Rutgers, The State University of New Jersey.
42. Yanmaz-Tuzel, O. and Ozbay, K. (2009) Chapter 19: Modeling Learning Impacts on Dayto-Day Travel Choice, Transportation and Traffic Theory 2009: Golden Jubilee, 387-403. doi: 10.1007/978-1-4419-0820-9_19
43. Wiering, M.A. (2000) Learning to control traffic lights with multi-agent reinforcement learning, First World Congress of the Game Theory Society, Utrecht, Netherlands, Basque Country University and Foundation, Spain.