Learning Representations and Models of the World for Solving Complex Tasks
, LG AI Research, University of Michigan Ann Arbor
In recent years, deep learning has progressed tremendously for many fields of AI, such as visual perception, speech recognition, language understanding, and robotics. However, many of these methods require large amounts of supervision and do not generalize well for unseen tasks. We still have a long way ahead toward developing a general-purpose artificial intelligence agent that can perform many useful tasks with high sample efficiency and strong generalization abilities to previously unseen tasks. I'll present my recent work on tackling these challenges. First, we'll present several methods for learning representations and models of the environment that can improve exploration, sample efficiency, and generalization performance of the agents. For example, we'll show that learning representations of the controllable aspects of the environment dynamics leads to improved exploration for sparse reward tasks. Further, we'll present methods for learning latent representations from environment dynamics, which improve sample efficiency and generalization performance in various control tasks. Finally, I'll present a method for solving complex tasks with hierarchical compositional dependencies between sub-tasks. Specifically, we'll propose and address a novel few-shot reinforcement learning problem, where a task is characterized by a sub-task graph that describes a set of sub-tasks and their dependencies that are unknown to the agent. Instead of directly learning a black-box meta-policy, we develop a meta-learner with Subtask Graph Inference (MSGI), which infers the latent parameter of the task by interacting with the environment and maximizes the return given the latent parameter. Our experiment results on grid-world domains and game environments with compositional tasks show that the proposed method can accurately infer the latent task structures and adapt more efficiently than prior methods. We'll further present a method for transferring from prior task structures learned in training time to unseen novel tasks in test time, which leads to an order-of-magnitude gain in sample efficiency.