Dissertation
OA Logo

Learning navigation policies with deep reinforcement learning

Personen-Informationen ausblenden

Erschienen in
Bibliographische Angaben
Erscheinungsjahr: 2021
DOI: 10.6094/UNIFR/218235 URN: urn:nbn:de:bsz:25-freidok-2182353 Sprache: englisch
Abstract
  • englisch
Humans learn that we first need to make efforts, take risks, and put ourselves in difficult positions in order to achieve long-term goals, as every decision we make does not only influence our immediate state but could also have future implications. In this thesis, we focus on studying methods for control problems that involve sequential decision making, in which the actions of intelligent agents would affect the environment they operate in. In particular, we focus on solutions to such problems that require the least amount of human interventions, seeking for general algorithms that could help to automate the process of developing intelligent decision-making agents. Therefore, we build on the general framework of deep reinforcement learning to learn control policies through interactions with the environment. As navigation is an essential skill for autonomous intelligent systems, this thesis takes learning to navigate as the main running task, setting out to address several challenges that arise when learning optimal policies directly from sensory inputs. This thesis initiates from asking the question of whether it is feasible to replace the traditional navigation pipeline with an end-to-end deep reinforcement learning system, then further proposes algorithms that facilitate transferring learned navigation policies to related task instances. Then the focus is turned to learn navigation in more exploration-challenging environments, where we interface a canonical agent with an external memory within a completely differentiable neural network. By learning to write to and read from the external memory, the agent is able to make informed decisions in hard navigation tasks. Afterwards, we target transferring deep reinforcement learning policies learned in simulation to the real world. Questioning the canonical sim-to-real approaches, we propose a real-to-sim algorithm as a lightweight and flexible alternative. Additionally, we propose a novel shift loss that is agnostic to the downstream task to impose consistency constraints, successfully adapting single-frame domain adaption approaches to sequential problems. Finally, this thesis puts great interest in learning control policies in terminal reward settings, as this scenario requires the least amount of human priors and would thus largely automate the training of artificial decision-making agents. As structured and guided exploration becomes vital in this case, we again question the mainstream approaches of utilizing intrinsic motivation as reward bonuses, taking a hierarchical view on accelerating exploration. We argue that our proposed approach is a more suitable treatment for intrinsically-motivated exploration, as the behavior policy space is implicitly increased exponentially. Moreover, we propose a novel intrinsic reward that takes a temporally extended view on states, which facilitates exploration even further. In summary, this thesis investigates several key aspects of learning control policies through deep reinforcement learning, with a focus on navigation tasks. We hope that our proposed methods could offer insights to the learning control community.

Beschreibung

Dateien
Lizenz Creative Commons CC BY (Namensnennung
)
DFG
Dieser Beitrag ist mit Zustimmung des Rechteinhabers (Verlag) aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.
Creative Commons CC BY 4.0 (Namensnennung) 
Creative Commons CC BY 4.0 (Namensnennung)
Creative Commons CC BY 4.0 (Namensnennung)
Learning navigation policies with deep reinforcement learning
von
Jingwei Zhang

ist lizenziert unter einer
Creative Commons CC BY 4.0 (Namensnennung)
Creative Commons CC BY (Namensnennung ) 
Creative Commons CC BY (Namensnennung
)
Creative Commons CC BY (Namensnennung )
Learning navigation policies with deep reinforcement learning
ist lizenziert unter einer
Creative Commons CC BY (Namensnennung )
  • thesis-zhang.pdf SHA256 checksum: 9da1cab07e96a059cb554bddd2ab7b148550c1053f7961c90274f5227125ab2d
    Download (28.59 MB)

  • Beschreibung der Forschungsdaten

    Relationen
    Laden...
    Laden...

    Prüfungsangaben Fakultät: Technische Fakultät Betreuer:in: Burgard, Wolfram Prüfungsdatum: 29.04.2021
    Korrekturanfrage
    Vielen Dank für Ihre Korrekturanfrage. Wir werden uns um Ihre Anfrage kümmern und uns ggf. über die angegebene E-Mail-Adresse bei Ihnen zurückmelden. Bitte haben Sie Verständnis dafür, dass die Korrektur unter Umständen einige Tage dauern kann.
    Es ist ein Fehler aufgetreten. Bitte versuchen Sie es später noch einmal.