eCommons

 

Principled Off-Policy Imitation Learning via Boosting

Other Titles

Abstract

Imitation learning is a promising paradigm to learn policies to solve a variety of tasks given some expert data. In particular, off-policy imitation learning is particularly nice for practitioners, as it in principle allows the policy to use previously collected data to improve, similar to standard value-based off-policy reinforcement learning algorithms such as Deep Q Learning and actor-critic methods. However, this is generally ill-defined, as the policy improvement operator can generally only in principle be applied to data collected by the most recent policy, making the algorithm on-policy. To mitigate this while still remaining off-policy, we design an actor-critic method where we treat the replay buffer as a collection of data from a set of weak learners. Our algorithm more appropriately weights each weak learner’s data when it comes to sampling for policy optimization, offering a principled way to mitigate the above distribution mismatch problem in the off-policy setting. We apply this technique to both state and vision-based tasks in the DeepMind Control Suite domain and see that our method does indeed improve learning in terms of sample efficiency.

Journal / Series

Volume & Issue

Description

40 pages

Sponsorship

Date Issued

2023-08

Publisher

Keywords

Adversarial Learning; Imitation Learning; Reinforcement Learning

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Sun, Wen

Committee Co-Chair

Committee Member

Kleinberg, Robert

Degree Discipline

Computer Science

Degree Name

M.S., Computer Science

Degree Level

Master of Science

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record