, Associate Professor, University of California, Berkeley
Highly Rated
Machine learning systems are useful in the real world insofar as they can make decisions that lead to the outcomes that we want. Whether we want a system to drive an autonomous car or an image recognition engine to tag our friends in photographs on social media, predictions and outputs of machine learning systems lead to consequences, and we would like them to make the decisions that lead to the consequences that we prefer. This makes it natural to think about machine learning frameworks that directly reason about decisions and their consequences — namely, reinforcement learning. However, reconciling reinforcement learning with the data-driven paradigm under which most modern machine learning systems operate is difficult, because reinforcement learning in its classic form is an active and online learning paradigm. Can we get the best of both worlds — the data-driven approach in supervised or unsupervised learning that can utilize large, previously collected datasets, and the decision-making formalism of reinforcement learning that enables reasoning about decisions and their consequences? I'll discuss how offline reinforcement learning can make it possible, and discuss how offline RL can enable effective pre-training from suboptimal multitask data, broad generalization in real-world domains, and compelling applications in settings such as robotics and dialogue systems.