Blog » ML Project Management » How To Manage a Deep Reinforcement Learning Research Team Part 2: Fractal Nature of Creative Work

How To Manage a Deep Reinforcement Learning Research Team Part 2: Fractal Nature of Creative Work

In part one, we discussed the importance of focusing on a well-defined project. In this article, we’re diving even deeper, because we’re going to talk about the fractal nature of creative work, or why it’s hard to do meaningful work when your projects are built from sub-projects, which are built from sub-sub-projects, that are built from… you know the drill.

We’ll start with an example from my research work, then I’ll explain this phenomenon, and talk about how to deal with it. After you read this article, it would be great to discuss it on LinkedIn, because I haven’t found many articles about this, and yet I believe this is a pretty important and common part of managing research projects.


READ NEXT
👉 7 Applications of Reinforcement Learning in Finance and Trading
👉 10 Real-Life Applications of Reinforcement Learning


All work, no results

Illustrating the problem

My example project is about finding a new exploration algorithm for RL agents. I want it to be Bayesian (or at least inspired by Bayesian methods), since Bayes made a comeback and it’s gaining traction recently in the AI community.

This is a big project, so it’s best to divide it into smaller parts. I start by contacting my advisors and looking through the literature. Deep Exploration caught my eye. I read about it, catch up with Deep Ensembles and Bootstrap, code some experiments, experiment a little.

Then, I have an idea, and it’s Max-entropy RL. I start to dive deeper into SAC to really understand what it’s doing, and find myself in the middle of the “simplify RL” discussion. This is interesting, I code it and it works very well, and it was indeed simple.

Meanwhile, I start to experiment with creating new, difficult exploratory continuous control environments in MuJoCo. I built a couple of agents that mix Max-entropy RL with Deep Exploration, and I decided that SOTA algorithms do very well in MuJoCo, so there’s no point in making another algorithm that will run a little faster to the right in, say, HalfCheetah. I start to experiment with Humanoid, and find that Hierarchical RL has nice Ant environments, where the agent has to figure out how to push a box in order to get to the goal.

Okay, allright, so I jump into Hierarchical RL and start reviewing the literature. I find this paper, which shows that Hierarchical RL is okay (sometimes), however, there are simpler algorithms that solve Ant environments. These algorithms derive from the idea of Deep Exploration and ensemble/bootstrap networks. 

Great, we’ve come full circle after a couple of months of work!

So that’s the main problem. A lot of starting and not much finishing. I got caught in the trap of the fractal nature of creative work!

Fractal nature of creative work

“Fractal (lat. fractus – broken, partial, fractional) in the colloquial sense usually means a self-similar object (i.e. one whose parts are similar to the whole) or “infinitely complex” (showing more and more complex details under any great magnification)” ~ Wikipedia

My observation is: each idea you want to bring to life, each topic you want to learn about, each (sub-)project you want to finish, everything is an entire world in itself. Mr. Gray talks about it in the Cortex podcast (episode #96 “Levels, Levels”).

When you do creative work, each step deeper into the topic doesn’t narrow down the scope of your research. Instead, it reveals in front of you a space of new threads as big as the original topic itself.

Diving deeper and deeper into your problem at hand, you might find yourself in the infinite project of researching stuff. If lucky, you’ll bump into something that can be published. 

However, it often happens that in order to get to something worthwhile, you have to push one, and only one topic much harder than simply identifying its branches and jumping further down the hole.

How do I deal with this problem?

Although the exploration stage is impossible to jump over, at some point you have to stop exploring, focus, and get deliberate. Project lock can help you deal with complexity.

Project lock

Project lock is when you consciously lock onto one of the discovered (sub-)projects for a while. Please take a look at the infographic below.

Managing Fractal Nature of the Creative Work

Explore

It’s pretty much the process I described in the previous section. You read, talk, note, experiment, play, etc. a lot during this stage. You recognize the ground in a more or less systematic way. However, it’s important to go to the next stage as soon as you have this gut feeling that you found something.

Lock

It’s time to organize the creative mess that formed during the exploration stage. In my system, I put all the projects I spawned together in a priority queue. The project on top has the highest priority and I lock onto it.

Remember that each project has to clearly state the outcome you want to achieve (see my previous post). Failure to achieve something is an outcome too! And it’s a very popular one when you’re doing research.

Push

This is a very important and often skipped stage. This stage differentiates good researchers from mediocre ones.

Work hard on the top project from the queue. Push for the result. Don’t allow for distractions!

As you work on this project, you might discover more projects. Queue them according to your intuition, however, always below the current top project! Again, push the top one and only the top one.

When you finally get to the outcome of the current project (or you consciously decide it’s not worth pushing further because of the new information you’ve acquired), check it off and go to the next project in the queue.

Iterate

You should iterate with the process above, meaning at anytime you can decide to get back to the previous stage e.g. from Push to Lock or even to Explore. Why would you do that? Because you learn! You have new information and you should adapt.

What’s the point of this three-step process then you ask? This is a framework that is useful to have in your head to systematize your work. Without it, you are at risk of getting stuck in the “Explore” stage indefinitely. You can freely choose when you switch stages, however, always remember that there are three of them and all are crucial for success.

Finish

Speaking of success, don’t forget about your goal in the big picture, e.g. your research should end with the published paper. Set a clear definition of done (e.g. the paper is published at this or this conference) and direct each iteration towards this goal. This framework can be fun and make you feel in control, which is pleasant, but you have to finish it at some point with less pleasant writing and reviewers feedback 😉

Discussion

That’s it for today. Now, I’d like to know your thoughts. Do you get caught up in creative fractals? Share your experiences and ways to deal with this – you can message me on LinkedIn.


READ NEXT

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Jakub Czakon | Posted November 26, 2020

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

  • use different models and model hyperparameters
  • use different training or evaluation data, 
  • run different code (including this small change that you wanted to test quickly)
  • run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics. 

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.  

This is where ML experiment tracking comes in. 

Continue reading ->
RL project management

How to Manage a Deep Reinforcement Learning Research Team – Part 1

Read more
RL examples

10 Real-Life Applications of Reinforcement Learning

Read more
RL agents

How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug

Read more
Logging in RL

Logging in Reinforcement Learning Frameworks – What You Need to Know

Read more