We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”
"Neptune made sense to us due to its pay-per-use or usage-based pricing. Now when we are doing active experiments then we can scale up and when we’re busy integrating all our models for a few months that we scale down again."
Hypefactors is a technology company that works in the media intelligence and reputation tracking domain. They provide an ML-based Public Relations (PR) automation platform, including all the tools to power the PR workflow and visualize the results.
We spoke to the CTO of Hypefactors, Viet Yen Nguyen who leads the technical teams and is responsible for the technical side of the company. He explains this business as:
“Apple Inc. is worth approximately 2 trillion but if you go to their warehouses, stores, and sum up everything they own you don’t get to 2 trillion. You may get to maybe 60 billion worth of physical goods.
So where is the 1.94 trillion of value that is reflected in the price of the stock? It is the future prospects, the brand potential, the reputation, the intangible things.
We help teams actually measure and track that.”
What is the project about?
The data pipelines at Hypefactors monitor the whole media landscape ranging from their social media pipelines to print media, television, radio, etc, to analyze changes in their customers’ brand reputation. This feat is achieved in two phases:
- 1Getting the data from everywhere
- 2Enriching the data with ML-based features
“We have two elements that are one-on-one. We monitor the whole media landscape and harvest the data. So we have our own web crawler to our social media pipelines to print data pipelines and broadcast. On top of that, we ingest all that in our data pipelines. And then the second part happens: we do all the enrichment over that data, for example, the changes to reputation.”
In order to analyze every form of data ranging from image, text, and tabular, they work on a variety of ML problems:
- NLP classification
- computer vision (e.g. segmentation)
- regression for business metrics
As they train and improve many enrichment models using different ML techniques, this naturally involves running many experiments and articulating ways to store the metadata generated by those experiments.
To tackle these problems Viet’s team had a set structure in place:
With competent engineers in the data and AI teams, Viet’s team was able to achieve good results with all the components in the pipeline except for Experiment Tracking. Let’s peek into the problems they faced during the process and what measures they took to solve them.
Viet’s team at Hypefactors are very pragmatic in their approach, they address problems at hand and try to focus on those. In the enrichment phase, there were fewer experiments to run initially hence experiment tracking wasn’t seen as a problem and everyone on the team had their own setup which broadly can be classified into:
- Slack for collaboration: They were using slack messages to share information and collaborate on different experiments. This included sharing of artifacts and model metadata among team members.
- Personal notes/files: Engineers on Viet’s team devised their own personal strategies to store training metadata and model artifacts in their respective systems as text and configuration files. They used the same approach while comparing experiments and model configurations and sharing results.
“The two ways of how we manage is: Slack, where we share informally and people are keeping their own personal notes. So the problem with this arose when we ran many experiments. You cannot find things in all the slack messages. And even when you do you cannot be sure which model weights correspond to given notes.”
This method worked well for them until there were more models, features, and also people working on them. As the complexity of the problem grew there was a sudden burst of experiments. This created a bottleneck in the pipeline since:
- Slack is no longer an efficient medium to share results
- Personal strategies created structural bottlenecks
- Increased experiments mean loads of metadata
The sudden burst in experiments meant increased communication, hundreds of variations of datasets, model architectures, and corresponding outcomes. This led to team members quickly losing track of the checkpoints and other model metadata since one could only handle limited volumes of information over a messaging platform.
“Our experimentations go in bursts. In these bursts, we face hundreds of variations of datasets, model architectures, and corresponding outcomes.” – Viet Yen Nguyen, CTO at Hypefactors
Since everybody had their own way to store metadata on their systems, once the number of experiments increased, it led to comparison issues. The team members had to give in more time while trying to compare experiments with metadata stored in different formats, a non-uniform approach like this created a time-overhead and proved to be error-prone too.
“Personal strategies vary in fidelity of tracking metadata and its context. This meant that we relied too much on our memory on what experiments were run before, creating cognitive overhead.” – Viet Yen Nguyen, CTO at Hypefactors
A huge number of experiments also drew loads of metadata and model artifacts and saving these required more nested directories to be created and this proved to be a lot of hassle than they anticipated.
One outcome of this was losing checkpoints in the process, as Andrea, Data Scientist at Hypefactors recalls this incident:
“Losing track of a particular checkpoint was one of the major setbacks. The process was, we run the experiment and save the interesting checkpoints. And then when we pick it up a few weeks later, we would lose track of the metadata and other logs belonging to the particular checkpoint making the whole thing a mess.”
Here’s a comment from Viet, CTO at Hypefactors talking about the same:
“In our team, people don’t do one task and repeat it over and over. It could be that let’s say one month we do no machine learning experiments at all. And then another month comes in and suddenly, like three people are doing experiments on the same problem because we have a bunch of new ideas. That creates a bottleneck for our current methods of tracking.”
This pushed Viet’s team to properly investigate the tools out there that ticked all the boxes pertaining to their requirements involving feature and team growth.
Need for a tool
As Viet explained that there are instances when sudden bursts of experiments occur and instances when there’s little to none for example during deployments and integrations. So they knew that:
- 1They wanted a tool that could fit their needs in terms of growing features and metrics to track
- 2They wanted a tool that has a good price to value tradeoff especially as they scale operations and usage
“The ad-hoc techniques we used weren’t effective. At some point, everybody agreed that we could do this better. And like always, I was open to suggestions. We were interested in a cost-beneficial approach which caters to our on/off needs for experiment tracking as we scale up and down.”
After trying out different tools, they shortlisted Neptune and Weight & Biases. But they decided to go with Neptune. Let’s look into the reasons behind this decision.
Why did they choose Neptune?
Neptune made sense to them over W&B due to its efficient cost-benefit structure. As mentioned by Viet, they occasionally run experiments so naturally, they didn’t want to pay when they were not using the tool.
Neptune works in a way that it understands customers’ requirements and value proposition and also understands the fact that every team is unique in their approach and can’t be confined to a 1-dimensional pricing structure, a pragmatic team like Viets’ certainly understands that. Let’s hear what Viet said about choosing Neptune as the solution:
“The way we work is that we do not experiment constantly. After checking out both Neptune and Weights and Biases, Neptune made sense to us due to its pay-per-use or usage-based pricing. Now when we are doing active experiments then we can scale up and when we’re busy integrating all our models for a few months that we scale down again.
So I discussed it with your CEO and we thought of this and he proposed an alternative solution which worked great for us.”
As the team shared with us, Neptune improved their entire workflow in a way that:
Every checkpoint is now connected to the respective experiment and can no way be misplaced. Metrics for every experiment are stored inside the experiment itself and are now easily comparable with other experiments.
“We use Neptune for most of our tracking tasks, from experiment tracking to uploading the artifacts. A very useful part of tracking was monitoring the metrics, now we could easily see and compare those F-scores and other metrics.” – Andrea Duque, Data Scientist at Hypefactors
Since web links for experiments can be created and shared among peers easily, cutting out on the hassle of communication through slack.
“As opposed to before, we now post links to the Neptune results and it works great for us” – Viet Yen Nguyen, CTO at Hypefactors
Thanks to Viet Yen Nguyen and Andrea Duque for their help in creating this case study!
Want to pay for a metadata store when you actually use it?
- Industry Media intelligence
- Location Copenhagen, Denmark
- Team size 20
- Frameworks PyTorch, Hugging Face, Hydra
- Neptune use cases experiment tracking, experiment management, collaboration