Comparative Evaluation for Effectiveness Analysis of Policy Based Deep Reinforcement Learning Approaches

Authors

DOI:

https://doi.org/10.24203/ijcit.v10i3.104

Keywords:

Deep Reinforcement Learning, Deep Learning, Multi Agent

Abstract

Deep Reinforcement Learning (DRL) has proven to be a very strong technique with results in various applications in recent years. Especially the achievements in the studies in the field of robotics show that much more progress will be made in this field. Undoubtedly, policy choices and parameter settings play an active role in the success of DRL. In this study, an analysis has been made on the policies used by examining the DRL studies conducted in recent years. Policies used in the literature are grouped under three different headings: value-based, policy-based and actor-critic. However, the problem of moving a common target using Newton's law of motion of collaborative agents is presented. Trainings are carried out in a frictionless environment with two agents and one object using four different policies. Agents try to force an object in the environment by colliding it and try to move it out of the area it is in. Two-dimensional surface is used during the training phase. As a result of the training, each policy is reported separately and its success is observed. Test results are discussed in section 5. Thus, policies are tested together with an application by providing information about the policies used in deep reinforcement learning approaches.

References

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani and P. Pérez, "Deep Reinforcement Learning for Autonomous Driving: A Survey," arXiv:2002.00444, 2020.

I. Kostrikov, D. Yarats and R. Fergus, "Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels," arXiv:2004.13649, 2020.

T. Fan, P. Long, W. Liu and J. Pan, "Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios," The International Journal of Robotics Research, vol. 39, no. 7, pp. 856-892, 2020.

Z. Cao, H. Guo, W. Song, K. Gao, Z. Chen, L. Zhang ve X. Zhang, «Using Reinforcement Learning to Minimize the Probability of Delay Occurrence in Transportation,» IEEE Transactions on Vehicular Technology, cilt 69, no. 3, pp. 2424-2436, 2020.

A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, S. Bae, A. Nazi, J. Pak, A. Tong, K. Srinivasa, W. Hang, E. Tuncer, A. Babu, Q. V. Le and J. La, "Chip Placement with Deep Reinforcement Learning," arXiv:2004.10746, 2020.

Z. Tan and M. Karaköse, "Comparative Study for Deep Reinforcement Learning with CNN, RNN, and LSTM in Autonomous Navigation," 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), 2020.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, "Human-level control through deep reinforcement learning," nature, pp. 529-533, 2015.

D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre and V. Den, "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, p. 484, 2016.

J. C. Caicedo and S. Lazebnik, "Active Object Localization With Deep Reinforcement Learning," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2488-2496, 2015.

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention.," In International Conference on Machine Learning, pp. 2048-2057, 2015.

S. Yeung, O. Russakovsky, G. Mori and L. Fei-Fei, "End-to-end learning of action detection from frame glimpses in videos," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2678-2687, 2015.

A. Maldonado-Ramirez, R. Rios-Cabrera and I. Lopez-Juarez, "A visual path-following learning approach for industrial robots using DRL," Robotics and Computer-Integrated Manufacturing, vol. 71, 2021.

C. J. Watkins and P. Dayan, "Q-Learning," Machine Learning, vol. 3, no. 8, pp. 279-292, 1992.

Z. Tan and M. Karaköse, "Optimized Deep Reinforcement Learning Approach for Dynamic System," 2020 IEEE International Symposium on Systems Engineering (ISSE), pp. 1-4, 2020.

D. Zhao, H. Wang, K. Shao and Y. Zhu, "Deep reinforcement learning with experience replay based on SARSA," 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-6, 2016.

S. Fujimoto, H. v. Hoof and D. Meger, "Addressing Function Approximation Error in Actor-Critic Methods," arXiv:1802.09477, 2018.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, "Continuous control with deep reinforcement learning.," arXiv preprint arXiv:1509.02971, 2015.

Q. Shi, H.-K. Lam, C. Xuan and M. Chen, "Adaptive neuro-fuzzy PID controller based on twin delayed deep deterministic policy gradient algorithm," Neurocomputing, vol. 402, pp. 183-194, 2020.

R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, pp. 229-256, 1992.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P, Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," In International conference on machine learning , pp. 1928-1937, 2016.

S. Aradi, "Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles," IEEE Transactions on Intelligent Transportation Systems, pp. 1-20, 2020.

C. Chen, H. Wei, N. Xu, G. Zheng, M. Yang, Y. Xiong, K. Xu and Z. Li, "Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 3414-3421, 2020.

R. Raileanu, M. Goldstein, D. Yarats, I. Kostrikov and R. Fergus, "Automatic Data Augmentation for Generalization in Deep Reinforcement Learning," arXiv:2006.12862, 2020.

W. Liang, W. Huang, J. Long, K. Zhang, K.-C. Li ve D. Zhang, «Deep Reinforcement Learning for Resource Protection and Real-Time Detection in IoT Environment,» IEEE Internet of Things Journal, cilt 7, no. 7, pp. 6392 - 6401, 2020.

Z. Zhang, S. Zohren and S. Roberts, "Deep Reinforcement Learning for Trading," The journal of dinancial data science, vol. 2, no. 2, pp. 25-40, 2020.

X. Wu, H. Chen, J. Wang, L. Troiano, V. Loi and H. Fujita, "Adaptive stock trading strategies with deep reinforcement learning methods," Information Sciences, vol. 538, pp. 142-158, 2020.

T. Liu, Z. Tan, C. Xu, H. Chen and Z. Li, "Study on deep reinforcement learning techniques for building energy consumption forecasting," Energy and Buildings, vol. 208, 2020.

H. Liu, C. Yu, H. Wu, Z. Duan and G. Yan, "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting," Energy, vol. 202, 2020.

N. Zeng, H. Li, Z. Wang, W. Liu, S. Liu, F. E. Alsaadi and X. Liu, "Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip," Neurocomputing, 2020.

Z. Tian, X. Si, Y. Zheng, Z. Chen and X. Li, "Multi-Step Medical Image Segmentation Based on Reinforcement Learning," Journal of Ambient Intelligence and Humanized Computing, vol. 543, pp. 1-12, 2020.

C. D. Hsu, H. Jeong, G. J. Pappas and P. Chaudhari, "Scalable Reinforcement Learning Policies for Multi-Agent Control," arXiv:2011.08055, 2020.

S. Li, Y. Wu, X. Cui, H. Dong, F. Fang and S. Russell, "Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 4213-4220, 2019.

Z. Yan and Y. Xu, "A Multi-Agent Deep Reinforcement Learning Method for Cooperative Load Frequency Control of a Multi-Area Power System," Transactions on Power Systems, vol. 35, no. 6, pp. 4599-4608, 2020.

H. Fu, H. Tang, J. Hao, Z. Lei, Y. Chen and C. Fan, "Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces," arXiv preprint arXiv:1903.04959, 2019.

W. Ding, S. Li, H. Qian and Y. Chen, "Hierarchical Reinforcement Learning Framework Towards Multi-Agent Navigation," in International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 2019.

M. Ausin, "Leveraging Deep Reinforcement Learning for Pedagogical Policy Induction in an Intelligent Tutoring System," Proceedings of the 12th International Conference on Educational Data Mining, pp. 168-177, 2019.

A. Haydari and Y. Yilmaz, "Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey," IEEE Transactions on Intelligent Transportation Systems, pp. 1-22, 2020.

X. Qu, Z. Sun, Y. S. Ong, A. Gupta and P. Wei, "Minimalistic Attacks: How Little it Takes to Fool Deep Reinforcement Learning Policies," IEEE Transactions on Cognitive and Developmental Systems, 2020.

A. Russo and A. Proutiere, "Optimal Attacks on Reinforcement Learning Policies," arXiv:1907.13548, 2019.

M. Lopez-Martin, B. Carro and A. Sanchez-Esguevillas, "Application of deep reinforcement learning to intrusion detection for supervised problems," Expert Systems with Applications, vol. 141, 2020.

J. H. Tianpei Yang, Z. Meng, Z. Zhang, Y. Hu, Y. Cheng, C. Fan, W. Wang, Z. W. Wulong Liu and J. Peng, "Efficient Deep Reinforcement Learning via Adaptive Policy Transfer," arXiv preprint arXiv:2002.08037, 2020.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347 [cs.LG], 2017.

H. Cuayahuitl and S. Yu, "Deep reinforcement learning of dialogue policies with less weight updates," International Conference of the Speech Communication Association (INTERSPEECH),, 2017.

V. S. Dorbala, A. Srinivasan and A. Bera, "Can a Robot Trust You? A DRL-Based Approach to Trust-Driven Human-Guided Navigation," arXiv:2011.00554, 2020.

T. Rajapakshe, R. Rana, S. Latif, S. Khalifa and B. W. Schuller, "Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition," arXiv:1910.11256, 2019.

A. Garg, H.-T. L. Chiang, S. Sugaya, A. Faust and L. Tapia, "Comparison of Deep Reinforcement Learning Policies to Formal Methods for Moving Obstacle Avoidance," International Conference on Intelligent Robots and Systems (IROS), pp. 3534-3541, 2019.

Q. Shen, Y. Li, H. Jiang, Z. Wang and T. Zhao, "Deep Reinforcement Learning with Robust and Smooth Policy," proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 8707-8718, 2020.

B. Uzkent, C. Yeh and S. Ermon, "Efficient Object Detection in Large Images Using Deep Reinforcement Learning," Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1824-1833, 2020.

A. Pirinen and C. Sminchisescu, "Deep Reinforcement Learning of Region Proposal Networks for Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6945-6954, 2018.

M. B. Bueno, X. Giró-i-Nieto, F. Marqués and J. Torres, "Hierarchical Object Detection with Deep Reinforcement Learning," Deep Learning for Image Processing Applications, vol. 164, no. 3, p. 31, 2017.

X. Kong, B. Xin, Y. Wang and G. Hua, "Collaborative Deep Reinforcement Learning for Joint Object Search," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1695-1704, 2017.

X. L. Zequn Jie, J. Feng, X. Jin, W. F. Lu and S. Yan, "Tree-Structured Reinforcement Learning for Sequential Object Localization," arXiv preprint arXiv:1703.02710, 2017.

C. Tang and Y.-C. Lai, "Deep Reinforcement Learning Automatic Landing Control of Fixed-Wing Aircraft Using Deep Deterministic Policy Gradient," 2020 International Conference on Unmanned Aircraft Systems (ICUAS), 2020.

L. Cheng, F. Jiang and Z. Wang, "Real-time control for fuel-optimal Moon Landing Based on an Interactive Deep Reinforcement Learning Algorithm," Astrodynamics , vol. 3, pp. 375-386, 2019.

Y. Xu, Z. Liu and X. Wang, "Monocular Vision based Autonomous Landing of Quadrotor through Deep Reinforcement Learning," in 37th Chinese Control Conference (CCC), Wuhan, 2018.

J. Hu, H. Zhang, L. Song, R. Schober and H. V. Poor, "Cooperative Internet of UAVs: Distributed Trajectory Design by Multi-Agent Deep Reinforcement Learning," IEEE Transactions On Communications, vol. 68, no. 11, pp. 6807-6821, 2020.

Downloads

Published

2021-06-18

How to Cite

Tan, Z., & Karaköse, M. . (2021). Comparative Evaluation for Effectiveness Analysis of Policy Based Deep Reinforcement Learning Approaches. International Journal of Computer and Information Technology(2279-0764), 10(3). https://doi.org/10.24203/ijcit.v10i3.104

Issue

Section

Articles