Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents

    Abstract

    Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is not known how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and a version of QD we call NSR-ES, avoid local optima encountered by ES to achieve higher performance on tasks ranging from playing Atari to simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES.

    Team

    Uber AI

    Authors

    Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, Jeff Clune

    Conference

    NeurIPS 2017

    Full Paper

    ‘Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents’ (PDF at arXiv)

    Comments
    Previous articleOn the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent
    Next articleSBNet: Sparse Block’s Network for Fast Inference
    Vashisht Madhavan
    Vashisht (Vash) is a recent graduate of UC Berkeley, where he received his BS and MS in Computer Science, with a focus in Computer Vision and Artificial Intelligence. At Berkeley, his work focused on perception systems for autonomous vehicles. His interests lie at the intersection of computer vision, machine learning, and reinforcement learning.
    Felipe Petroski Such
    Felipe Petroski Such is a research scientist focusing on deep neuroevolution, reinforcement learning, and HPC. Prior to joining the Uber AI labs he obtained a BS/MS from the RIT where he developed deep learning architectures for graph applications and ICR as well as hardware acceleration using FPGAs.
    Joel Lehman
    Joel Lehman was previously an assistant professor at the IT University of Copenhagen, and researches neural networks, evolutionary algorithms, and reinforcement learning.
    Kenneth O. Stanley
    Before joining Uber AI Labs full time, Ken was an associate professor of computer science at the University of Central Florida (he is currently on leave). He is a leader in neuroevolution (combining neural networks with evolutionary techniques), where he helped invent prominent algorithms such as NEAT, CPPNs, HyperNEAT, and novelty search. His ideas have also reached a broader audience through the recent popular science book, Why Greatness Cannot Be Planned: The Myth of the Objective.
    Jeff Clune
    Jeff Clune is the Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming and a Senior Research Manager and founding member of Uber AI Labs, which was formed after Uber acquired the startup Geometric Intelligence. Jeff focuses on robotics and training neural networks via deep learning and deep reinforcement learning. He has also researched open questions in evolutionary biology using computational models of evolution, including studying the evolutionary origins of modularity, hierarchy, and evolvability. Prior to becoming a professor, he was a Research Scientist at Cornell University, received a PhD in computer science and an MA in philosophy from Michigan State University, and received a BA in philosophy from the University of Michigan. More about Jeff’s research can be found at JeffClune.com