Exploring demonstration pre-training with improved Deep Q-learning

Max Pettersson; Florian Westphal; Maria Riveiro

doi:10.3384/ecp208008

Authors

Max Pettersson
Florian Westphal
Maria Riveiro

DOI:

https://doi.org/10.3384/ecp208008

Abstract

This study explores the effects of incorporating demonstrations as pre-training of an improved Deep Q-Network (DQN). Inspiration is taken from methods such as Deep Q-learning from Demonstrations (DQfD), but instead of retaining the demonstrations throughout the training, the performance and behavioral effects of the policy when using demonstrations solely as pre-training are studied. A comparative experiment is performed on two game environments, Gymnasium's Car Racing and Atari Space Invaders. While demonstration pre-training in Car Racing shows improved learning efficacy, as indicated by higher evaluation and training rewards, these improvements do not show in Space Invaders, where it instead under-performed. This divergence suggests that the nature of a game's reward structure influences the effectiveness of demonstration pre-training. Interestingly, despite less pronounced quantitative differences, qualitative observations suggested distinctive strategic behaviors, notably in target elimination patterns in Space Invaders. These retained behaviors seem to get forgotten during extended training. The results show that we need to investigate further how exploration functions affect the effectiveness of demonstration pre-training, how behaviors can be retained without explicitly making the agent mimic demonstrations, and how non-optimal demonstrations can be incorporated for more stable learning with demonstrations.

Exploring demonstration pre-training with improved Deep Q-learning

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License