Repository logo

Policy optimisation and generalisation for reinforcement learning agents in sparse reward navigation environments.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title



Sparse reward environments are prevalent in the real world and training reinforcement learning agents in them remains a substantial challenge. Two particularly pertinent problems in these environments are policy optimisation and policy generalisation. This work is focused on the navigation task in which agents learn to navigate past obstacles to distant targets and are rewarded on completion of the task. A novel compound reward function, Directed Curiosity, a weighted sum of curiosity-driven ex-ploration and distance-based shaped rewards is presented. The technique allowed for faster convergence and enabled agents to gain more rewards than agents trained with the distance-based shaped rewards or curiosity alone. However, it resulted in policies that were highly optimised for the specific environment that the agents were trained on, and therefore did not generalise well to unseen environments. A training curricu-lum was designed for this purpose and resulted in the transfer of knowledge, when using the policy “as-is”, to unseen testing environments. It also eliminated the need for additional reward shaping and was shown to converge faster than curiosity-based agents. Combining curiosity with the curriculum provided no meaningful benefits and exhibited inferior policy generalisation.


Masters Degree. University of KwaZulu-Natal, Durban.