MLOps Blog

How to Manage a Deep Reinforcement Learning Research Team – Part 1

5 min
4th August, 2023

This blog post is the start of a series on managing research work. In it, you’ll see how I maintain a helpful list of tasks, and manage my teams to finish these tasks.

At some point in our careers as researchers, a lot of us have to switch from a one-man army mode of work to managing a team of fellow researchers. Unfortunately, many of us don’t have management education or experience.

One of the first problems that come up in this scenario is how to organize a backlog of tasks in a useful way, so everybody knows what to do and how their work fits the larger picture (which is a big part of motivating your team). 

Based on my experience, I’ll show you how to create and keep a list of tasks that frees the research team up to get things done.

But first, let’s talk about productivity.


Best Reinforcement Learning Tutorials, Examples, Projects, and Courses
7 Applications of Reinforcement Learning in Finance and Trading
The Best Reinforcement Learning Papers from the ICLR 2020 Conference

Productivity is overrated

Professor Cal Newport, author of “So Good They Can’t Ignore You” and “Deep Work”, once wrote that “productivity is overrated”. He argued that being productive and being accomplished don’t need to go together.

This seems to be true. You probably know people who are very accomplished, even though they’re not masters of productivity. You might be one of them.

Like my father – he’s constantly distracted with calls, and has to work late hours to get anything done. Yet he still runs a successful business, and takes good care of the family.

As for me, I’m kind of a productivity geek. While in college, I read all of the big books on productivity, like “Getting Things Done”, and then organised my own productivity system. 

It made me less stressed, and let me spend more quality time with friends and family. But it’s not a magic solution, and it’s not what helped me accomplish my goals. What helped me the most was persistence, and willingness to push hard to complete projects, even when it was hurting my brain.

What differentiates accomplished people from busy people is an obsession with completing projects, especially those that are hard to finish.

This is the key to the task management system that I’ll be showing you – a focus on outcomes and completion.

Managing Reinforcement Learning project page

I learned this technique from The Art of the Finish: How to Go From Busy to Accomplished by Cal Newport. If you’re into this stuff, I can definitely recommend the whole book.

Projects, not tasks

From now on there are no tasks, only projects. Tasks are the little things that you do. Projects are what you achieve, and that’s much, much more important. 

You can spend all week checking off tasks and feel fantastic about it, only to find out on Friday that you achieved nothing.

Projects don’t have to be big, like finishing a whole paper with underlying research and experiments in one week. A project is an atomic outcome, like ”find out how backpropagation works in deep learning”. 

If you divide this into small tasks (find study materials on backprop, print study materials, etc.), you might finish a lot of tasks in a week and still not know how backprop works. 

Focus too much on the task, like looking for and reading the materials, and you can miss what you should really be doing. You should put your headphones on, dive into any literature on backpropagation you can find, take notes and discuss it with colleagues to challenge your understanding.


Best Tools to Manage Machine Learning Projects
The Best Software for Collaborating on Machine Learning Projects

To be accomplished, you and your team have to focus on projects, not tasks. Your goal for the workweek isn’t to do tasks, it’s to finish a project.

This may seem obvious right now, but it can really catch you. I still remember one time I had a project to implement a SqueezeNet deep neural network. After a week of work, I found myself deep in TensorFlow docs, reading about the nitty-gritty details of the softmax implementation. 

Without any working code, of course. What went wrong? If this isn’t enough, it turned out that my teammates gave up waiting for me and used open-source code. I focused on my tasks so much that I became useless to my team!


Okay, now you know that projects and their outcomes are more important than tasks. But how do you define a good outcome, especially if you don’t have management experience? I prepared some example projects and their possible outcomes to give you an idea.


Reading literature

– Find an answer to a specific question, like “how the backprop. algorithm works”, and understand it. Note, it’s not enough to read about backprop! You have to produce an outcome, like a detailed note, or explaining it to a friend. If I study something bigger, I like to schedule a seminar for my team. This way we have time just to share knowledge. Also, my notes are useful to revise the topic in the future.If you don’t like to write, you can prepare a talk, or make a video, or a podcast. Be creative and specific with the outcomes of learning.
– You don’t even know what you don’t know (or you’re doing something called “reviewing papers”)? Then your outcome is to create a list of questions/problems/methods/results/etc. Then, if you need, you can tackle each question as a project. You can decide what makes the most sense to get done, and in what order.

Doing experiments

– State a hypothesis as clearly as possible, and design an experiment that can help you confirm it, reject it, or even fail to confirm/reject it (all outcomes are equally good). Your goal is to get a tiny bit better, with each experiment, at the research problem you’re tackling. Notice here that you could find yourself stuck honing the perfect implementation for days, which isn’t the point of the experiment at all! Each experiment outcome should be some bit of knowledge you gain about your problem. Focus on this

Writing papers

– When starting with an empty document, your goal is to simply finish one paragraph, or one (sub)section. It doesn’t have to be perfect. It has to be good enough to push it to the next step: sending it for review to a supervisor, an advisor, or your teammates.
– Armed in feedback about the current version, your goal is to correct your text. Again, it doesn’t have to be perfect. Read it, fix it as best as you can. Iterate only to the point when it’s good enough, and the cost of further improvements is too big (there are other things to do).
– Note that if you focus on perfecting the whole paper from the beginning, and divide it into a very long list of atomic tasks, it can get you nowhere. You’d end up stuck on it for days or weeks, whereas the iterative approach lets you be more flexible with your work. You can leave a gap, write something else, and return to fill in the gap once you’ve done enough experimenting. This way you can use writing to push your project in the right direction when you feel stuck!

In these examples, the key is to divide a big project into smaller projects with concrete valuable outcomes. This way you can focus on finishing one project at a time, as fast as possible (not simply being busy with a lot of tasks), then evaluate the situation and repeat the process. Iteration is a very powerful tool!

Put projects on a page

Now you can create a page with your projects. Below is an example page from one of my projects on predicting football match scores, using Google Research Football simulator as a data source and testbed. @UW and @PG are fictional initials of two team members.

  • @UW wrote down results on predicting expected score from hand-crafted features into the paper.
  • @PG cleaned Transformer end-to-end solution code for publication on GitHub.
  • XX prepared the app to run predictions for each method with Monte-Carlo estimations for comparison.
  • @PG run the first Transformer end-to-end solution training using Simple115 observation format.
    • @PG overfitted a smaller end-to-end model to a batch of data.
  • @UW learned about Transformers training in the football games score prediction context.
  • @UW thought deeply about the project using the “An Introduction to Statistical Learning” book as food for thought to spawn new ideas for the next steps.

Above you can see a list of projects described in past simple tense (this is optional) assigned to two team members. Note that each project clearly states the outcome we aim for. Crossed out positions are finished projects, and XX indicates an unassigned project. 

This list tells the team what’s done, what everyone’s currently doing, and what we’re doing next.

Now let’s discuss how to work with this page.

Hub & Spoke

Hub-​and-​spoke office or work arrangement tells you that team members should be exposed to ideas in hubs on a regular basis, but maintain a spoke in which to work deeply on what they encounter.

Your team members should be able to work on their project however they want. Don’t micromanage them, it doesn’t work. Especially in research, where there is no way to force a result.

Serve your teammates as a coach, not a commander. If something doesn’t go as you’d like it to, communicate clearly to your teammate what should change, and offer ways to help.

I find a good way to do this is to set a Scrum-like daily meeting. In it, I start by reviewing the project page. Then, we talk about what was done since the last meeting, about the problems we found and how to solve them, and we update our project page. 

Often we might reassign projects, or divide them into even smaller subprojects, and rearrange their order. We adjust to the new data that we, as a team, gained.

Some tips for meetings

We’re currently in the coronavirus pandemic, which forced us to meet only online. Working remotely is pretty standard nowadays, so you’ve probably done it to some extent even before the pandemic. Over time, I learned a few things about organizing online meetings as well as face-to-face:

  • Everyone has to take these meetings seriously. Be present at the meeting, sync with your teammates, don’t do stuff in the background!
  • Don’t be late!
  • Find a quiet place, and mute your microphone when not talking.
  • Get prepared before the meeting. Respect other peoples’ time, don’t search for the results you want to present while everybody is waiting.
  • Always organize meetings with the minimum number of members required. You can start the meeting together, get through the stuff that everyone should hear, and then meet in smaller groups to talk about specific topics.
  • Don’t schedule meetings longer than required! It’s either up to 10-15 minutes and “let’s discuss this one thing”, or it’s closer to 1h and a general sync with brainstorming, ending with an update to the project page. I find that most other types of meetings are a waste of time.
  • Don’t hesitate to end the meeting as fast as possible, when you see the discussion isn’t producing value anymore.


Tasks might be useful. Or, as I’d call them, “next actions” can be useful. When you see a raw outcome like: “finish the experimental section of the paper”, it might be daunting to even start. 

However, then you can define a next action – “the next specific, concrete thing you can do now to move a project forward”. In this case, it could be “create a new LaTeX document with the conference template”. 

If this doesn’t help you get started, then start a timer for 20 minutes and tell yourself: “I’ll work on this project for 20 minutes only, starting with this next action”. 

If at the end of the 20-minute session you feel like standing up and getting coffee, go for it. However, more often than not you’ll find yourself in the flow, with nothing on your mind but the project.

All in all, never forget what’s really important: completing the project!

Was the article useful?

Thank you for your feedback!