Case Study

InstaDeep

I like that Neptune does not get in your way – it is not very intrusive. It also does very well with the comparison of runs, sharing, and working collaboratively.
Nicolas Lopez Carranza
DeepChain and BioAI Lead at InstaDeep

InstaDeep is an EMEA leader in delivering decision-making AI products. Leveraging their extensive know-how in GPU-accelerated computing, deep learning, and reinforcement learning, they have built products, such as the novel DeepChain™ platform, to tackle the most complex challenges across a range of industries. 

Instadeep
InstaDeep | Source

InstaDeep has also developed collaborations with global leaders in the AI ecosystem, such as Google DeepMind, NVIDIA, and Intel. They are part of Intel’s AI Builders program and are one of only 2 NVIDIA Elite Service Delivery Partners across EMEA. The InstaDeep team is made up of approximately 155 people working across its network of offices in London, Paris, Tunis, Lagos, Dubai, and Cape Town, and is growing fast.

About the BioAI team

The BioAI team is the place at InstaDeep where Biology meets Artificial intelligence. At BioAI, they advance healthcare and push the boundaries of medical science through a combination of biology and machine learning expertise. They are currently building DeepChain™, their platform for protein design. They are also working with their customers in the bio sector to tackle the most challenging problems with the help of bioinformatics and machine learning.

Instadeep
DeepChain dashboard | Source

They apply the DeepChain™ protein design platform to engineer new sequences for protein targets using sophisticated optimization techniques such as reinforcement learning and evolutionary algorithms. They also leverage Language Models pre-trained on millions of protein sequences and train their own in-house protein language models. Finally, they use machine learning to predict protein structure from sequence.

Problem

Building complex software like DeepChain™, a platform for protein design, requires a lot of research with different moving parts. Customers demand various types of solutions that require new experiments and research every time. With several experiments running for different customers, it will be unavoidably daunting for a team of any size to keep track of the experiments while ensuring they remain productive.

Fazed with the thought of managing numerous experiments, Nicolas and the BioAI team encountered a series of challenges:

  • 1 Experiment logs were all over the place
  • 2 It was difficult to share experiment results
  • 3 Machine learning researchers were dealing with infrastructure and operations
  • “Logs ended up all over the place as there was no centralized repository for the data.” – Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep

    Dealing with an enormous amount of experiment logs is hard enough – what makes it harder is when they are not organized. Finding experiment results became a huge challenge for the team – there was hardly any visibility. With logs scattered across documents and files, experiments become difficult to manage. You spend more time figuring out where the results are rather than doing the actual research. Engineers and researchers would take a long time to compare results of previous runs because they would have to search for the log file where a previous experiment result was logged. This was really unproductive for the team to deal with.

  • “When you use TensorBoard or equivalent tools, you have to deal with the extra DevOps stuff of exposing your localhost somehow if you wish to share your results.” – Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep

    Working on many experiments requires researchers to work collaboratively and share results. This isn’t just challenging to achieve when logs are not organized but if there is no easy way to share results, the collaboration between researchers becomes difficult. With the team at BioAI, sharing results was a bit complicated to do. Obstacles like this one would often throw any researcher off and affect flow during work. There was no easy way for the team to share experiment results from TensorBoard without some workarounds. If a researcher has to worry about exposing the URL to their experiment results, ensuring it is secure (especially for sensitive work), all before sharing them with a colleague, it would inevitably reduce the urge to collaborate.

  • “I’d say the advantage (of TensorBoard) is that it’s free and it works pretty well but anytime an engineer wanted to show the team some training curve, they’d need to start the VM (Virtual Machine) containing the logs, or make their localhost port available, expose it to the internet, it was not very secure… When you end up having to start a VM just to visualize some logs, you realize there should be a better tool.” – Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep

    Researchers are majorly trained to research. It’s often a damp in productivity when they have to configure the infrastructure to do their job. This is one of the challenges the BioAI team encountered when they used TensorBoard to manage their experiments. Oftentimes, they had to tackle operational challenges that included spinning up and managing the infrastructure for TensorBoard before they could visualize experiment results. If you have had to configure any type of infrastructure for any software before, you will know this is far from an easy task – even for an operations engineer.

Solution

Plagued with these challenges, Nicolas decided to look for a solution that could help the team solve their experiment management problems, utilizing their time and efforts more efficiently. 

avatar lazyload
quote
We needed a tool that could expose and share TensorBoard-like dashboards between the team, and store the logs of previous runs.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

In searching for a solution for the team, Nicolas also needed a tool that could fit the following criteria:

  • 1 Easy to use
  • 2 Simple to connect to TensorFlow and PyTorch logs
  • 3 Wasn’t too expensive

It turned out that some of the engineers on the BioAI team were already using a tool for their projects that fit the criteria Nicolas outlined.

avatar lazyload
quote
Over time we realised that our ML engineers were utilising Neptune even for small personal projects so we realised it was the right approach.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Neptune ended up being the tool the team adopted not just because it was popular among the engineers on the team, but also met the requirements for an experiment management solution for the BioAI team.

avatar lazyload
quote
Before Neptune, we did not have a similar platform. I know there are other tools on the market and some people on the team have used those. But most of our team members already had Neptune accounts and knew the tool. We had to standardize our tool stack and decided to go with Neptune.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Neptune proved to be the ideal solution for the team because:

  • It was accessible
  • It provided more visibility for experiment logs
  • It enabled straightforward collaboration between researchers and engineers on the team
  • It eliminated the need for operations and infrastructure configuration

Accessibility

Knowing the challenge they previously faced with TensorBoard, the team needed to make sure Neptune was accessible and easy to use – it turned out it was. As Nicolas explained during his encounter with Neptune:

avatar lazyload
quote
The documentation is quite detailed and offers code snippets to get started hence in less than 1 hour, we were logging our machine learning projects.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

The team also found the Neptune SDK was complete and useful for their workflow.

avatar lazyload
quote
Well, I think the documentation is very good, so you can get started very easily. You will just create your token and the SDK is also quite complete. I think these are the main advantages—you can get started very quickly.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Visibility for experiment logs

One of the challenges the team faced before using Neptune was that logs were scattered and unorganized which made visibility of experiment results difficult. This was a crucial consideration for the BioAI team as they needed a solution to manage all their experiments in a central repository so the logs can be easily tracked.

Neptune provided more visibility to experiment logs by centralizing all experiment runs, indexing, and organizing them. This way, the team could view details of various experiment runs, search for specific runs through their metadata and tags, and see visualizations related to specific experiments.

Neptune dashboard
Neptune dashboard | Source
avatar lazyload
quote
I think Neptune does one thing very well, which is get your logs and charts right where and when you need them… The search feature of looking for runs and using the tags for the runs is very good as well. The idea of tagging experiment runs is very useful.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Straightforward collaboration

Remember the problem the BioAI team had with sharing results? When they had to expose localhost to be able to share experiment results. Well, it turned out Neptune was able to solve that challenge for the team by providing a password-protected and easily shareable link to experiment results without any extra configuration.

This made collaboration on experiments straightforward and required no additional hassle. They were also able to collaborate on research through Neptune’s feature for comparing runs and experiment metrics run by different researchers and engineers.

avatar lazyload
quote
I like that Neptune does not get in your way – it is not very intrusive. It also does very well with the comparison of runs, sharing, and working collaboratively.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Eliminated the need for operations and configuring infrastructure

As we learned earlier, one of the obstacles to the productivity of the researchers at BioAI was that they were spending time configuring infrastructure and all the DevOps stuff before they could use a tool to manage their experiment logs. Neptune eliminated the need for configuring infrastructure to log experiment results by providing a fully-managed solution for the team.

avatar lazyload
quote
No more DevOps needed for logging. No more starting VMs just to look at some old logs. No more moving data around to compare TensorBoards.
Nicolas Lopez Carranza DeepChain and BioAI Lead at InstaDeep

Results

Nicolas and the BioAI team have been using Neptune for about 2 years (as of the time this case study was published) and over the long period of using Neptune, the team has reasonably improved its workflow. Neptune proved to be a useful solution because:

  • “No time spent looking for the data. It’s always there, available, and displayed the way we want.” – Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep

    The visibility brought about by Neptune’s centralized and organized experiment management dashboard ensured the team spent little to no time searching for results and metadata on experiments.

  • “We use it (Neptune) daily, as a big part of what we do is sharing results and discussing them. The [team’s] productivity increased for this reason.”  – Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep

    As Neptune was able to eliminate most of the obstacles the team faced managing a lot of experiments, making them focus on research and getting other things out of the way, the team’s productivity improved, as well as their ability to collaborate on experiments.

    “I definitely recommend you give Neptune a try because it is a very useful tool and I think you will enjoy it.” –  Nicolas Lopez Carranza, DeepChain and BioAI Lead at InstaDeep


Thanks to Nicolas Lopez Carranza for his help in creating this case study!

Want your team to be more productive and focus on experimentation?

    Contact with us

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
    * - required fields