Case Study

How TH Köln Avoided the (Many) Errors of Manual Result Comparison With Neptune

My productivity in collaborating with students and also my own research speed increased dramatically. I wouldn’t know how to do my work without Neptune.

Jan Bollenbacher

Research Assistant at TH Köln

Before

Manual tracking process that was extremely prone to mistakes

No way to share results within the team

After

Out-of-the-box experiment tracker integrated with their workflow

All experiment data accessible to the whole team within 1-2 minutes

TH Köln is Germany’s biggest University of Applied Sciences in the Electrical Engineering Department. Jan Bollenbacher works there as a member of the staff and is also pursuing his PhD alongside. One of the goals of Jan’s department is to pave the way for students to understand, experiment, and leverage great tools that could potentially accelerate technological projects.

When it comes to AI and ML, the department is focused on meta-learning research with standard ML frameworks such as TensorFlow and PyTorch.

*Deutz Campus, TH Köln | Source: TH Köln*

The challenge

As most academic researchers will agree, working with colleagues on research papers while managing multiple student projects is not an easy task, especially in research departments that handle large-scale projects.

Like many research teams, Jan and his team managed experiment tracking across multiple servers by manually creating CSV files to record details generated during the experiment run, such as loss and metrics, hyperparameters and other configurations.

To analyze the data, they would need to download every CSV file from every server, load the whole thing in Python, and plot it. This made the analysis of past experiments extremely challenging and prone to errors due to the manual management of multiple files.

Deciding between open-source and managed solution

Jan initially explored open-source solutions for metadata collection, such as Sacred, to support their research projects. But, he quickly came across significant drawbacks with these systems. Complex installation processes not only consumed substantial time but also posed ongoing security risks and resource allocation issues. Delayed updates and the necessity of frequent system resets only added to these challenges, pulling focus from core machine learning research to tedious administrative tasks.

Seeking a more efficient alternative, Jan discovered Neptune through a tech forum, drawn by its promise of minimal setup time and seamless integration. Neptune’s solution was operational within minutes, requiring only a few lines of code to integrate into their existing workflows, significantly reducing the administrative overhead associated with open-source alternatives.

This ease of use, coupled with robust documentation and responsive support, made Neptune an ideal choice. Jan and his team could now devote more resources to actual research without the constant distractions of system maintenance and security concerns typical of open-source tools.

I really liked the open-source community and the open-source thinking. But after spending days or weeks with installation, there might be an update coming which blows up everything. Therefore, you wouldn’t do updates frequently which could be a security concern.

Neptune is well documented and very easy to use. In five or ten lines of code that you need to implement your project, you will have something like a logger right away. Then you just would need to have about five different places where you need to implement Neptune. And in case of any issues, the support team is always helpful.

Jan Bollenbacher Research Assistant at TH Köln

Centralized metadata management for multi-server workflow

Since Jan’s department handles large research projects, they run across multiple servers. This implies that data is generated by each of those servers and must be recorded accordingly. Not having a centralized location for data logging and analysis is time-consuming and exhausting for the researchers. Additionally, a significant incident involving a power outage highlighted the vulnerability of their local data storage approach.

In response, as a cloud-based tool, Neptunecentralized the logging and analysis of the team’s research data. This move to Neptune provided a robust solution by ensuring all data was securely backed up and easily accessible in one location, enhancing both data integrity and accessibility.

Smooth collaboration with proper access control

Another important problem to tackle with the new solution was the collaboration aspect. Since Jan’s team worked across multiple servers, every user had an account on the servers. However, only Jan, the project administrator, could view and get information from every file. The students couldn’t access Jan’s data or projects and were prone to repeat mistakes that Jan might have already fixed. To avoid this, the team has to set up something like a Dropbox, which is not ideal for research collaboration.

With Neptune, everyone can view the results, and Jan can properly manage access to the workspace and projects. Also, Neptune’s simple persistent URL generation for specific views or visualizations significantly enhanced communication between Jan and his students, streamlining the onboarding process and ensuring continuity even when students left the project.

This platform resolved access issues and enabled a robust sharing system, maintaining the integrity and utility of experimental data across diverse research projects. Plus, whenever someone leaves the team, the team still has access to their historical data and knowledge.

That’s a function I use often with students. They send me a link and when I click on the link, I know what they are saying and which experiment they are talking about.

Jan Bollenbacher Research Assistant at TH Köln

Out-of-the-box advanced experiment analysis capabilities

As you remember, before Neptune, the process of analyzing results was very manual, with many steps and a lot of space for errors. Now, instead of working with multiple CSV files and spreadsheets, Jan and his team have all the analytical capabilities available in Neptune.

The quick and intuitive query language in Neptune’s user interface means that Jan can effortlessly pull up relevant data on the fly, enhancing discussions and decision-making processes.

And Neptune’s ability to save filtered views simplified the management of multiple experiments. Researchers no longer need to manually copy queries between experiments or recreate presentation views. Instead, they can select different views in Neptune and automatically apply them to new data sets, thus maintaining consistency across experiments with minimal effort.

These features, combined with Neptune’s detailed, interactive visuals and flexible structure, allowed the team at TH Köln to efficiently compare and analyze experiments, fostering a more productive and engaged research environment.

For each item, I write search queries in Neptune and get all the results into one graph. I also query the Neptune database during the meetings because sometimes you need additional info on the go. It’s very easy to query data, and I extensively use the tags and the tagging function is really helpful for me.

Jan Bollenbacher Research Assistant at TH Köln

The results

Real-time data querying enhances meeting efficiency and decision-making.
Significantly reduced clerical workload by automating query and visualization processes.
Improved experiment tracking with a centralized, easily accessible data repository.
Smooth and secure team collaboration thanks to user access management and shareable URLs.

Thanks to Jan Bollenbacher for his help in creating this case study!