# Joel Lehman

## Engineering Blog Articles

### Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Building upon our existing open-ended learning research, Uber AI released Enhanced POET, a project that incorporates an improved algorithm and allows for more diverse training environments.

### Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Developed by Uber AI Labs, Generative Teaching Networks (GTNs) automatically generate training data, learning environments, and curricula to help AI agents rapidly learn.

### Introducing EvoGrad: A Lightweight Library for Gradient-Based Evolution

Uber AI Labs releases EvoGrad, a library for catalyzing gradient-based evolution research, and Evolvability ES, a new meta-learning algorithm enabled by this library.

### Creating a Zoo of Atari-Playing Agents to Catalyze the Understanding of Deep Reinforcement Learning

Uber AI Labs releases Atari Model Zoo, an open source repository of both trained Atari Learning Environment agents and tools to better understand them.

### POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and their Solutions through the...

Uber AI Labs introduces the Paired Open-Ended Trailblazer (POET), an algorithm that leverages open-endedness to push the bounds of machine learning.

### Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on...

Uber AI Labs introduces Go-Explore, a new reinforcement learning algorithm for solving a variety of challenging problems, especially in robotics.

### An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

As powerful and widespread as convolutional neural networks are in deep learning, AI Labs’ latest research reveals both an underappreciated failing and a simple fix.

## Research Papers

### Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

**R. Wang**,

**J. Lehman**,

**A. Rawal**,

**J. Zhi**,

**Y. Li**,

**J. Clune**,

**K. Stanley**

Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. [...]

**[PDF]**

*International Conference on Machine Learning (*

**ICML**), 2020### Evolvability ES: Scalable and Direct Optimization of Evolvability

**A. Gajewski**,

**J. Clune**,

**K. O. Stanley**,

**J. Lehman**

Designing evolutionary algorithms capable of uncovering highly evolvable representations is an open challenge; such evolvability is important because it accelerates evolution and enables fast adaptation to changing circumstances. This paper introduces evolvability ES, an evolutionary algorithm designed to explicitly and efficiently optimize for evolvability, i.e. the ability to further adapt. [...]

**[PDF]**

*The Genetic and Evolutionary Computation Conference (*

**GECCO**), 2019### Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their...

**R. Wang**,

**J. Lehman**,

**J. Clune**,

**K. Stanley**

While the history of machine learning so far encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. [...]

**[PDF]**

*2019*

### Go-Explore: a New Approach for Hard-Exploration Problems

**A. Ecoffet**,

**J. Huizinga**,

**J. Lehman**,

**K. Stanley**,

**J. Clune**

A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. [...]

**[PDF]**

*2019*

### An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents

**F. Such**,

**V. Madhavan**,

**R. Liu**,

**R. Wang**, P. Castro,

**Y. Li**, L. Schubert, M. Bellemare,

**J. Clune**,

**J. Lehman**

Much human and computational effort has aimed to improve how deep reinforcement learning algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of reinforcement learning (RL) algorithms. [...]

**[PDF]**

*2018*

### An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

**R. Liu**,

**J. Lehman**,

**P. Molino**,

**F.i Such**,

**E. Frank**,

**A. Sergeev**,

**J. Yosinski**

Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. [...]

**[PDF]**

*Advances in Neural Information Processing Systems*

**(NeurIPS)**, 2018### The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation...

**J. Lehman**,

**J. Clune**, D. Misevic, C. Adami, L. Altenberg, J. Beaulieu, P. Bentley, S. Bernard, G. Beslon, D. Bryson, P. Chrabaszcz, N. Cheney, A. Cully, S. Doncieux, F. Dyer, K. Ellefsen, R. Feldt, S. Fischer, S. Forrest, A. Frénoy, C. Gagné, L. Goff, L. Grabowski, B. Hodjat, F. Hutter, L. Keller, C. Knibbe, P. Krcah, R. Lenski, H. Lipson, R. MacCurdy, C. Maestre, R. Miikkulainen, S. Mitri, D. Moriarty, J. Mouret, A. Nguyen, C. Ofria, M. Parizeau, D. Parsons, R. Pennock, W. Punch, T. Ray, M. Schoenauer, E. Shulte, K. Sims,

**K. Stanley**, F. Taddei, D. Tarapore, S. Thibault, W. Weimer, R. Watson,

**J. Yosinski**

Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. [...]

**[PDF]**

*2018*

### Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking...

E. Conti,

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. [...]

**V. Madhavan**,**F. Such**,**J. Lehman**,**K. Stanley**,**J. Clune**Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. [...]

**[PDF]***ViGIL @ NeurIPS 2017***(NeurIPS)**, 2017### Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients

**J. Lehman**,

**J. Chen**,

**J. Clune**,

**K. Stanley**

While neuroevolution (evolving neural networks) has a successful track record across a variety of domains from reinforcement learning to artificial life, it is rarely applied to large, deep neural networks. A central reason is that while random mutation generally works in low dimensions, a random perturbation of thousands or millions of weights is likely to break existing functionality, providing no learning signal even if some individual weight changes were beneficial. [...]

**[PDF]**

*The Genetic and Evolutionary Computation Conference*

**(GECCO)**, 2018### Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for...

**F. Such**,

**V. Madhavan**, E. Conti,

**J. Lehman**,

**K. Stanley**,

**J. Clune**

Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. [...]

**[PDF]**

*Deep RL @ NeurIPS 2018*

### ES Is More Than Just a Traditional Finite-Difference Approximator

**J. Lehman**,

**J. Chen**,

**Jeff Clune**,

**Kenneth O. Stanley**

An evolution strategy (ES) variant based on a simplification of a natural evolution strategy recently attracted attention because it performs surprisingly well in challenging deep reinforcement learning domains. It searches for neural network parameters by generating perturbations to the current set of parameters, checking their performance, and moving in the aggregate direction of higher reward. [...]

**[PDF]**

*The Genetic and Evolutionary Computation Conference*

**(GECCO)**, 2018