How To Manage a Deep Reinforcement Learning Research Team Part 2: Fractal Nature of Creative Work
In part one, we discussed the importance of focusing on a well-defined project. In this article, we’re diving even deeper, because we’re going to talk about the fractal nature of creative work, or why it’s hard to do meaningful work when your projects are built from sub-projects, which are built from sub-sub-projects, that are built from… you know the drill.
We’ll start with an example from my research work, then I’ll explain this phenomenon, and talk about how to deal with it. After you read this article, it would be great to discuss it on LinkedIn, because I haven’t found many articles about this, and yet I believe this is a pretty important and common part of managing research projects.
7 Applications of Reinforcement Learning in Finance and Trading
10 Real-Life Applications of Reinforcement Learning
All work, no results
Illustrating the problem
My example project is about finding a new exploration algorithm for RL agents. I want it to be Bayesian (or at least inspired by Bayesian methods), since Bayes made a comeback and it’s gaining traction recently in the AI community.
This is a big project, so it’s best to divide it into smaller parts. I start by contacting my advisors and looking through the literature. Deep Exploration caught my eye. I read about it, catch up with Deep Ensembles and Bootstrap, code some experiments, experiment a little.
Then, I have an idea, and it’s Max-entropy RL. I start to dive deeper into SAC to really understand what it’s doing, and find myself in the middle of the “simplify RL” discussion. This is interesting, I code it and it works very well, and it was indeed simple.
Meanwhile, I start to experiment with creating new, difficult exploratory continuous control environments in MuJoCo. I built a couple of agents that mix Max-entropy RL with Deep Exploration, and I decided that SOTA algorithms do very well in MuJoCo, so there’s no point in making another algorithm that will run a little faster to the right in, say, HalfCheetah. I start to experiment with Humanoid, and find that Hierarchical RL has nice Ant environments, where the agent has to figure out how to push a box in order to get to the goal.
Okay, allright, so I jump into Hierarchical RL and start reviewing the literature. I find this paper, which shows that Hierarchical RL is okay (sometimes), however, there are simpler algorithms that solve Ant environments. These algorithms derive from the idea of Deep Exploration and ensemble/bootstrap networks.
Great, we’ve come full circle after a couple of months of work!
So that’s the main problem. A lot of starting and not much finishing. I got caught in the trap of the fractal nature of creative work!
Fractal nature of creative work
“Fractal (lat. fractus – broken, partial, fractional) in the colloquial sense usually means a self-similar object (i.e. one whose parts are similar to the whole) or “infinitely complex” (showing more and more complex details under any great magnification)” ~ Wikipedia
My observation is: each idea you want to bring to life, each topic you want to learn about, each (sub-)project you want to finish, everything is an entire world in itself. Mr. Gray talks about it in the Cortex podcast (episode #96 “Levels, Levels”).
When you do creative work, each step deeper into the topic doesn’t narrow down the scope of your research. Instead, it reveals in front of you a space of new threads as big as the original topic itself.
Diving deeper and deeper into your problem at hand, you might find yourself in the infinite project of researching stuff. If lucky, you’ll bump into something that can be published.
However, it often happens that in order to get to something worthwhile, you have to push one, and only one topic much harder than simply identifying its branches and jumping further down the hole.
How do I deal with this problem?
Although the exploration stage is impossible to jump over, at some point you have to stop exploring, focus, and get deliberate. Project lock can help you deal with complexity.
Project lock is when you consciously lock onto one of the discovered (sub-)projects for a while. Please take a look at the infographic below.
It’s pretty much the process I described in the previous section. You read, talk, note, experiment, play, etc. a lot during this stage. You recognize the ground in a more or less systematic way. However, it’s important to go to the next stage as soon as you have this gut feeling that you found something.
It’s time to organize the creative mess that formed during the exploration stage. In my system, I put all the projects I spawned together in a priority queue. The project on top has the highest priority and I lock onto it.
Remember that each project has to clearly state the outcome you want to achieve (see my previous post). Failure to achieve something is an outcome too! And it’s a very popular one when you’re doing research.
This is a very important and often skipped stage. This stage differentiates good researchers from mediocre ones.
Work hard on the top project from the queue. Push for the result. Don’t allow for distractions!
As you work on this project, you might discover more projects. Queue them according to your intuition, however, always below the current top project! Again, push the top one and only the top one.
When you finally get to the outcome of the current project (or you consciously decide it’s not worth pushing further because of the new information you’ve acquired), check it off and go to the next project in the queue.
You should iterate with the process above, meaning at anytime you can decide to get back to the previous stage e.g. from Push to Lock or even to Explore. Why would you do that? Because you learn! You have new information and you should adapt.
What’s the point of this three-step process then you ask? This is a framework that is useful to have in your head to systematize your work. Without it, you are at risk of getting stuck in the “Explore” stage indefinitely. You can freely choose when you switch stages, however, always remember that there are three of them and all are crucial for success.
Speaking of success, don’t forget about your goal in the big picture, e.g. your research should end with the published paper. Set a clear definition of done (e.g. the paper is published at this or this conference) and direct each iteration towards this goal. This framework can be fun and make you feel in control, which is pleasant, but you have to finish it at some point with less pleasant writing and reviewers feedback 😉
That’s it for today. Now, I’d like to know your thoughts. Do you get caught up in creative fractals? Share your experiences and ways to deal with this – you can message me on LinkedIn.