Last week I had a pleasure to participate in the International Conference on Learning Representations (ICLR), an event dedicated to the research on all aspects of representation learning, commonly known as deep learning. The conference went virtual due to the coronavirus pandemic, and thanks to the huge effort of its organizers, the event attracted an even bigger audience than last year. Their goal was for the conference to be inclusive and interactive, and from my point of view, as an attendee, it was definitely the case!
Inspired by the presentations from over 1300 speakers, I decided to create a series of blog posts summarizing the best papers in four main areas. You can catch up with the first post about the best deep learning papers here, and today it’s time for 15 best reinforcement learning papers from the ICLR.
The Best Reinforcement Learning Papers
1. Never Give Up: Learning Directed Exploration Strategies
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
(TL;DR, from OpenReview.net)
Main authors:
2. Program Guided Agent
We propose a modular framework that can accomplish tasks specified by programs and achieve zero-shot generalization to more complex tasks.
(TL;DR, from OpenReview.net)
3. Model Based Reinforcement Learning for Atari
We use video prediction models, a model-based reinforcement learning algorithm and 2h of gameplay per game to train agents for 26 Atari games.
(TL;DR, from OpenReview.net)
Main authors:

Błażej Osiński
4. Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
We generate critical states of a trained RL algorithms to visualize potential weaknesses.
(TL;DR, from OpenReview.net)
5. Meta-Learning without Memorization
We identify and formalize the memorization problem in meta-learning and solve this problem with novel meta-regularization method, which greatly expand the domain that meta-learning can be applicable to and effective on.
(TL;DR, from OpenReview.net)
Main authors
6. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Exponential lower bounds for value-based and policy-based reinforcement learning with function approximation.
(TL;DR, from OpenReview.net)
7. The Ingredients of Real World Robotic Reinforcement Learning
System to learn robotic tasks in the real world with reinforcement learning without instrumentation.
(TL;DR, from OpenReview.net)
8. Improving Generalization in Meta Reinforcement Learning using Learned Objectives
We introduce MetaGenRL, a novel meta reinforcement learning algorithm. Unlike prior work, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training.
(TL;DR, from OpenReview.net)
9. Making Sense of Reinforcement Learning and Probabilistic Inference
Popular algorithms that cast “RL as Inference” ignore the role of uncertainty and exploration. We highlight the importance of these issues and present a coherent framework for RL and inference that handles them gracefully.
(TL;DR, from OpenReview.net)
10. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
SEED RL, a scalable and efficient deep reinforcement learning agent with accelerated central inference. State of the art results, reduces cost and can process millions of frames per second.
(TL;DR, from OpenReview.net)
11. Multi-agent Reinforcement Learning for Networked System Control
This paper proposes a new formulation and a new communication protocol for networked multi-agent control problems.
(TL;DR, from OpenReview.net)
the belief of agent i.
First author: Tianshu Chu
12. A Generalized Training Approach for Multiagent Learning
This paper studies and extends Policy-Spaced Response Oracles (PSRO). It’s a population-based learning method that uses game theory principles. Authors extend the method so that it’s applicable to multi-player games, while providing convergence guarantees in multiple settings.

First author: Paul Muller
13. Implementation Matters in Deep RL: A Case Study on PPO and TRPO
Sometimes an implementation detail may play a role in your research. Here, two policy search algorithms were evaluated: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). “Code-level optimizations”, should be negligible form the learning dynamics. Surprisingly, it turns out that h optimizations turn out to have a major impact on agent behavior.
Main authors:
14. A Closer Look at Deep Policy Gradients
This is in-depth, empirical study of the behavior of the deep policy gradient algorithms. Authors analyse SOTA methods based on gradient estimation, value prediction, and optimization landscapes.
Main authors:
15. Meta-Q-Learning
MQL is a simple off-policy meta-RL algorithm that recycles data from the meta-training replay buffer to adapt to new tasks.
(TL;DR, from OpenReview.net)
Main authors:
Summary
Depth and breadth of the ICLR publications is quite inspiring. Here, I just presented the tip of an iceberg focusing on the “reinforcement learning” topic. However, as you can read in this analysis, there were four main areas discussed at the conference:
- Deep learning (covered in our previous post)
- Reinforcement learning (covered in this post)
- Generative models (here)
- Natural Language Processing/Understanding (here)
In order to create a more complete overview of the top papers at ICLR, we are building a series of posts, each focused on one topic mentioned above. You may want to check them out for a more complete overview.
Feel free to share with us other interesting papers on reinforcement learning and we will gladly add them to the list.
Enjoy reading!
READ NEXT
How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug
Piotr Januszewski | Posted September 29, 2020
Based on simply watching how an agent acts in the environment it is hard to tell anything about why it behaves this way and how it works internally. That’s why it is crucial to establish metrics that tell WHY the agent performs in a certain way.
This is challenging especially when the agent doesn’t behave the way we would like it to behave, … which is like always. Every AI practitioner knows that whatever we work on, most of the time it won’t simply work out of the box (they wouldn’t pay us so much for it otherwise).
In this blog post, you’ll learn what to keep track of to inspect/debug your agent learning trajectory. I’ll assume you are already familiar with the Reinforcement Learning (RL) agent-environment setting (see Figure 1) and you’ve heard about at least some of the most common RL algorithms and environments.
Nevertheless, don’t worry if you are just beginning your journey with RL. I’ve tried to not depend too much on readers’ prior knowledge and where I couldn’t omit some details, I’ve put references to useful materials.
Continue reading ->