Policy optimisation and generalisation for reinforcement learning agents in sparse reward navigation environments.
dc.contributor.advisor | Pillay, Anban Woolaganathan. | |
dc.contributor.advisor | Jembere, Edgar. | |
dc.contributor.author | Jeewa, Asad. | |
dc.date.accessioned | 2023-04-13T13:08:53Z | |
dc.date.available | 2023-04-13T13:08:53Z | |
dc.date.created | 2021 | |
dc.date.issued | 2021 | |
dc.description | Masters Degree. University of KwaZulu-Natal, Durban. | en_US |
dc.description.abstract | Sparse reward environments are prevalent in the real world and training reinforcement learning agents in them remains a substantial challenge. Two particularly pertinent problems in these environments are policy optimisation and policy generalisation. This work is focused on the navigation task in which agents learn to navigate past obstacles to distant targets and are rewarded on completion of the task. A novel compound reward function, Directed Curiosity, a weighted sum of curiosity-driven ex-ploration and distance-based shaped rewards is presented. The technique allowed for faster convergence and enabled agents to gain more rewards than agents trained with the distance-based shaped rewards or curiosity alone. However, it resulted in policies that were highly optimised for the specific environment that the agents were trained on, and therefore did not generalise well to unseen environments. A training curricu-lum was designed for this purpose and resulted in the transfer of knowledge, when using the policy “as-is”, to unseen testing environments. It also eliminated the need for additional reward shaping and was shown to converge faster than curiosity-based agents. Combining curiosity with the curriculum provided no meaningful benefits and exhibited inferior policy generalisation. | en_US |
dc.identifier.uri | https://researchspace.ukzn.ac.za/handle/10413/21412 | |
dc.language.iso | en | en_US |
dc.subject.other | Curriculum learning. | en_US |
dc.subject.other | Machine learning--Reinforcement learning. | en_US |
dc.subject.other | Incentive awards. | en_US |
dc.subject.other | Employee motivation. | en_US |
dc.title | Policy optimisation and generalisation for reinforcement learning agents in sparse reward navigation environments. | en_US |
dc.type | Thesis | en_US |