Repository logo
 

Policy optimisation and generalisation for reinforcement learning agents in sparse reward navigation environments.

dc.contributor.advisorPillay, Anban Woolaganathan.
dc.contributor.advisorJembere, Edgar.
dc.contributor.authorJeewa, Asad.
dc.date.accessioned2023-04-13T13:08:53Z
dc.date.available2023-04-13T13:08:53Z
dc.date.created2021
dc.date.issued2021
dc.descriptionMasters Degree. University of KwaZulu-Natal, Durban.en_US
dc.description.abstractSparse reward environments are prevalent in the real world and training reinforcement learning agents in them remains a substantial challenge. Two particularly pertinent problems in these environments are policy optimisation and policy generalisation. This work is focused on the navigation task in which agents learn to navigate past obstacles to distant targets and are rewarded on completion of the task. A novel compound reward function, Directed Curiosity, a weighted sum of curiosity-driven ex-ploration and distance-based shaped rewards is presented. The technique allowed for faster convergence and enabled agents to gain more rewards than agents trained with the distance-based shaped rewards or curiosity alone. However, it resulted in policies that were highly optimised for the specific environment that the agents were trained on, and therefore did not generalise well to unseen environments. A training curricu-lum was designed for this purpose and resulted in the transfer of knowledge, when using the policy “as-is”, to unseen testing environments. It also eliminated the need for additional reward shaping and was shown to converge faster than curiosity-based agents. Combining curiosity with the curriculum provided no meaningful benefits and exhibited inferior policy generalisation.en_US
dc.identifier.urihttps://researchspace.ukzn.ac.za/handle/10413/21412
dc.language.isoenen_US
dc.subject.otherCurriculum learning.en_US
dc.subject.otherMachine learning--Reinforcement learning.en_US
dc.subject.otherIncentive awards.en_US
dc.subject.otherEmployee motivation.en_US
dc.titlePolicy optimisation and generalisation for reinforcement learning agents in sparse reward navigation environments.en_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Asad_Jeewa_2021.pdf
Size:
21.97 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: