Case Study

Hypefactors

Neptune made sense to us due to its pay-per-use or usage-based pricing. Now when we are doing active experiments then we can scale up and when we’re busy integrating all our models for a few months that we scale down again.
Viet Yen Nguyen
CTO at Hypefactors

Hypefactors is a technology company that works in the media intelligence and reputation tracking domain. They provide an ML-based Public Relations (PR) automation platform, including all the tools to power the PR workflow and visualize the results.

Hypefactors
Hypefactors dashboard | Source: Hypefactors

We spoke to the CTO of Hypefactors, Viet Yen Nguyen who leads the technical teams and is responsible for the technical side of the company. He explains this business as: 

avatar lazyload
quote
Apple Inc. is worth approximately 2 trillion but if you go to their warehouses, stores, and sum up everything they own you don’t get to 2 trillion. You may get to maybe 60 billion worth of physical goods.

So where is the 1.94 trillion of value that is reflected in the price of the stock? It is the future prospects, the brand potential, the reputation, the intangible things.

We help teams actually measure and track that.
Viet Yen Nguyen CTO at Hypefactors

What is the project about?

The data pipelines at Hypefactors monitor the whole media landscape ranging from their social media pipelines to print media, television, radio, etc, to analyze changes in their customers’ brand reputation. This feat is achieved in two phases:

  • 1 Getting the data from everywhere
  • 2 Enriching the data with ML-based features
avatar lazyload
quote
We have two elements that are one-on-one. We monitor the whole media landscape and harvest the data. So we have our own web crawler to our social media pipelines to print data pipelines and broadcast. On top of that, we ingest all that in our data pipelines. And then the second part happens: we do all the enrichment over that data, for example, the changes to reputation.
Viet Yen Nguyen CTO at Hypefactors

In order to analyze every form of data ranging from image, text, and tabular, they work on a variety of ML problems:

  • NLP classification
  • computer vision (e.g. segmentation)
  • regression for business metrics

As they train and improve many enrichment models using different ML techniques, this naturally involves running many experiments and articulating ways to store the metadata generated by those experiments.


To tackle these problems Viet’s team had a set structure in place:

Hypefactors pipeline
Hypefactors pipeline

With competent engineers in the data and AI teams, Viet’s team was able to achieve good results with all the components in the pipeline except for Experiment Tracking. Let’s peek into the problems they faced during the process and what measures they took to solve them.

Problem

Viet’s team at Hypefactors are very pragmatic in their approach, they address problems at hand and try to focus on those. In the enrichment phase, there were fewer experiments to run initially hence experiment tracking wasn’t seen as a problem and everyone on the team had their own setup which broadly can be classified into:

  • Slack for collaboration: They were using slack messages to share information and collaborate on different experiments. This included sharing of artifacts and model metadata among team members.
  • Personal notes/files: Engineers on Viet’s team devised their own personal strategies to store training metadata and model artifacts in their respective systems as text and configuration files. They used the same approach while comparing experiments and model configurations and sharing results.
avatar lazyload
quote
The two ways of how we manage is: Slack, where we share informally and people are keeping their own personal notes. So the problem with this arose when we ran many experiments. You cannot find things in all the slack messages. And even when you do you cannot be sure which model weights correspond to given notes.
Viet Yen Nguyen CTO at Hypefactors

This method worked well for them until there were more models, features, and also people working on them. As the complexity of the problem grew there was a sudden burst of experiments. This created a bottleneck in the pipeline since:

  • The sudden burst in experiments meant increased communication, hundreds of variations of datasets, model architectures, and corresponding outcomes. This led to team members quickly losing track of the checkpoints and other model metadata since one could only handle limited volumes of information over a messaging platform.

    “Our experimentations go in bursts. In these bursts, we face hundreds of variations of datasets, model architectures, and corresponding outcomes.” – Viet Yen Nguyen, CTO at Hypefactors

  • Since everybody had their own way to store metadata on their systems, once the number of experiments increased, it led to comparison issues. The team members had to give in more time while trying to compare experiments with metadata stored in different formats, a non-uniform approach like this created a time-overhead and proved to be error-prone too.

    “Personal strategies vary in fidelity of tracking metadata and its context. This meant that we relied too much on our memory on what experiments were run before, creating cognitive overhead.” – Viet Yen Nguyen, CTO at Hypefactors

  • A huge number of experiments also drew loads of metadata and model artifacts and saving these required more nested directories to be created and this proved to be a lot of hassle than they anticipated.
    One outcome of this was losing checkpoints in the process, as Andrea, Data Scientist at Hypefactors recalls this incident:

    “Losing track of a particular checkpoint was one of the major setbacks. The process was, we run the experiment and save the interesting checkpoints. And then when we pick it up a few weeks later, we would lose track of the metadata and other logs belonging to the particular checkpoint making the whole thing a mess.”  

    Here’s a comment from Viet, CTO at Hypefactors talking about the same:

    “In our team, people don’t do one task and repeat it over and over. It could be that let’s say one month we do no machine learning experiments at all. And then another month comes in and suddenly, like three people are doing experiments on the same problem because we have a bunch of new ideas. That creates a bottleneck for our current methods of tracking.”

This pushed Viet’s team to properly investigate the tools out there that ticked all the boxes pertaining to their requirements involving feature and team growth. 

Solution

Need for a tool

As Viet explained that there are instances when sudden bursts of experiments occur and instances when there’s little to none for example during deployments and integrations. So they knew that:

  • 1 They wanted a tool that could fit their needs in terms of growing features and metrics to track
  • 2 They wanted a tool that has a good price to value tradeoff especially as they scale operations and usage
avatar lazyload
quote
The ad-hoc techniques we used weren’t effective. At some point, everybody agreed that we could do this better. And like always, I was open to suggestions. We were interested in a cost-beneficial approach which caters to our on/off needs for experiment tracking as we scale up and down.
Viet Yen Nguyen CTO at Hypefactors

After trying out different tools, they shortlisted Neptune and Weight & Biases. But they decided to go with Neptune. Let’s look into the reasons behind this decision. 

Why did they choose Neptune?

Neptune made sense to them over W&B due to its efficient cost-benefit structure. As mentioned by Viet, they occasionally run experiments so naturally, they didn’t want to pay when they were not using the tool.

Neptune works in a way that it understands customers’ requirements and value proposition and also understands the fact that every team is unique in their approach and can’t be confined to a 1-dimensional pricing structure, a pragmatic team like Viets’ certainly understands that. Let’s hear what Viet said about choosing Neptune as the solution:

avatar lazyload
quote
The way we work is that we do not experiment constantly. After checking out both Neptune and Weights and Biases, Neptune made sense to us due to its pay-per-use or usage-based pricing. Now when we are doing active experiments then we can scale up and when we’re busy integrating all our models for a few months that we scale down again.

So I discussed it with your CEO and we thought of this and he proposed an alternative solution which worked great for us.
Viet Yen Nguyen CTO at Hypefactors

Results

As the team shared with us, Neptune improved their entire workflow in a way that:

  • Every checkpoint is now connected to the respective experiment and can no way be misplaced. Metrics for every experiment are stored inside the experiment itself and are now easily comparable with other experiments.

    “We use Neptune for most of our tracking tasks, from experiment tracking to uploading the artifacts. A very useful part of tracking was monitoring the metrics, now we could easily see and compare those F-scores and other metrics.” – Andrea Duque, Data Scientist at Hypefactors

  • Since web links for experiments can be created and shared among peers easily, cutting out on the hassle of communication through slack.

    “As opposed to before, we now post links to the Neptune results and it works great for us” – Viet Yen Nguyen, CTO at Hypefactors


Thanks to Viet Yen Nguyen and Andrea Duque for their help in creating this case study!

Want to pay for a metadata store
when you actually use it?

    Contact with us

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    * - required fields