Telecom companies have a lot of business and functional divisions that make them tick.
Data scientists that work in telecom can have various tasks to take care of depending on the division. There could be a data science team that improves customer experience, or one that powers the product & engineering division.
In this article, we’ll look at popular use cases of data science and related tools in a telecom company, from my perspective as an ex-telecom data scientist.
My typical day as a Data Scientist in telecom
As a data scientist in the customer experience team, there are three main types of tasks that my role involved.
Business As Usual (BAU)
BAU is where you track certain KPIs, or tune/refresh an existing machine learning model.
Honestly, it’s the least interesting work for a passionate data scientist, since it involves very little innovation and plenty of redundant tasks.
But it’s a very important part of telecom companies, because it helps leadership and executives make better decisions and run the business efficiently.
Building Proof of Concept (POC)
Most new projects start with Proof of Concept. It’s quite common to assign a chunk of time block to work on building these POCs.
Unlike BAU, this work requires focus, research and development. It’s a more creative position, where the outcome of your work isn’t always positive.
A lot of times, business stakeholders might come up with a business question, like “why is NPS (Net Promoter Score) going down?”.
As a consulting data scientist (many data scientists hired for this role are evaluated on their consulting-type problem-solving capabilities), you have to translate that business question into a data science problem.
First, you might explore the factors that influence NPS. Is there any seasonality to it? Sometimes even if the NPS survey is not friendly, customers tend to give bad scores because they don’t like the survey. As you can see, these problems are nuanced and require data scientists to understand the nature of business, and even psychology.
Data scientists need to wear their consulting hat, and translate what their analysis tells them into valid, valuable conclusions for business stakeholders.
BAU, POC and Consulting are broad categories of data scientist tasks, but not the only ones. Some other types of tasks, which may overlap with the above three, are:
If you’re someone who has worked for a large corporation, you know that a huge amount of your time is spent in meetings.
If your team (like the one I was part of) works in an Agile methodology, then your calendar is going to be filled with a huge amount of sprint-related meetings. Daily Stand-ups, Sprint Closures, Sprint Retros, Sprint Planning, Backlog Grooming / Stories Prioritization.
These are just related to sprints. There are other types of internal meetings. Team Meetings, one-on-ones with managers, skip-level meetings and so on.
Whether you’re a software engineer or a data scientist, meetings are kind of the biggest curse of working for a big organization.
You need them because they provide the context for problems that you’re solving, but too many meetings can make you exhausted. The days when you have sprint planning or a team meeting, are the days you’re going to feel tired just from attending meetings.
When you’re dealing with data, the process starts from identifying what type of data you need, who owns that data (which business unit), where it’s stored, whether you need information security approval to access the data, who all people required to sign off, and so on.
There are a lot of questions that you might have to answer, and issues to overcome before you can even begin your project or POC. In a telecom company, data is the holy grail. It has to be treated with utmost care, otherwise you’re endangering not just your team, but even your entire company and its brand value.
There are tons of safeguards in place to avoid any catastrophe, and they can be a big hindrance in letting you start your project.
Sometimes, data science projects might get cancelled simply because someone couldn’t go through the process of data governance, or because the process has taken more time than the actual project would have taken.
You can’t start a project before you go through all the necessary data governance procedures.
Data Science and Machine Learning use cases in telecom
These use cases will be limited to the perspective of a customer experience data scientist. Most data science use cases in a corporation would revolve around the team’s business KPIs.
Typically, the data science team would be tasked with either improving those KPIs, or preventing their downfall.
For a customer experience (CX) team, example KPIs could be:
- Customer Churn,
- Active Users – DAU, WAU, MAU (Daily, Weekly & Monthly Active Users),
- NPS – Net Promoter Score,
- CSAT – Customer Satisfaction Score.
The use cases based on these KPIs would be:
- Customer churn prediction,
- Identifying factors driving or influencing customer churn,
- Customer churn cost estimation,
- Increasing DAU, WAU, MAU (user retention & activity),
- Predicting NPS for a given customer case,
- Identifying factors driving or influencing NPS,
- Customer feedback insights from NPS & CSAT survey feedback,
- Sentiment analysis from NPS feedback,
- Building customer 360 intelligence (mostly for B2B accounts),
- Recommendation engine for upselling and cross-selling.
How are these use cases handled? First of all, it starts with an idea pitch. Let’s say, “recommendation engine” is our idea, and we’re pitching it to the relevant stakeholder – Head of CX or Customer Retention.
Next, we’d have some questions to answer during the pitching session. If the stakeholder is convinced by the pitch, the idea moves on to the POC stage.
The Proof of Concept stage is where we get to build a tiny prototype, using a small sample of real data. This is very similar to the MVP process in product companies.
The POC success is going to be measured based on a lot of factors (usually those success metrics are defined during the idea pitching stage). If “recommendation engine” is the idea, then the success metrics could be:
- How much upselling this can generate?
- How much customer wallet share this recommendation engine can help capture?
- Does it also avoid customer churn?
- Would it improve customer retention?
All these questions are usually answered both qualitatively and quantitatively, with actual numbers (for the small subset of sample data). If the project sponsor or business stakeholder/team believes that the POC is a success, then the project moves on to the production stage.
Moving a data science project from the POC/prototype stage to production generates new challenges.
Until now, it was just concerning the data science team and its respective business team. When it’s moving to production, it’s an entirely new ballgame. It goes like this:
- optimizing our existing code,
- following engineering/development team’s coding standards,
- quality assurance of the code and the machine learning model,
- ensuring dependencies,
- setting up CI/CD pipelines for future updates,
- deciding on the design of how to provide inference – micro service, API or a web app.
This requires the involvement of a lot more teams: IT, DevOps, Engineering. There’s a lot more project management procedures involved.
An important thing to note at this point is that at the POC stage, we present a deadline of when this project might go live. It’s important to take all the necessary stages between POC and production when computing the deadline.
Data Science tools, libraries, and frameworks used in telecom
It’s common to see different sets of tools used to solve the same type of problems in large organizations.
Take data visualization as an example. You might see the data science team (handling visualization) using Tableau for one project, and Qlik Sense for another project, while at the same time some team members might be building a new dashboard with R Shiny or Python Plotly Dash.
Bottomline – one problem, multiple tools. Also, because the telecom industry is usually capital-heavy, they don’t shy away from investing in expensive proprietary tools.
Let’s take a look at some of the tools used in the customer experience data science team of a telecom company.
Data infrastructure is something that’s often ignored and not talked about much while discussing data science teams. It’s blasphemous not to recognize the amount of work that the data engineering team puts together to make data and computational platforms available for the data science team.
Unlike many modern tech companies, a telecom company doesn’t have the luxury of adopting cloud solutions due to a lot of security and compliance measures.
Along with regulatory requirements, this makes it very difficult to use cutting edge computational platforms. Technologies like Teradata, SAS, that are considered old-school & expensive, are still powering such telecom data science teams.
What if you need a powerful machine? Well, you’ll probably have a powerful remote server available through VDI or Citrix (Virtualization). That comes with its own challenges of security and data location.
Even if you’ve got a powerful local computer/laptop, there are tons of restrictions:
- Inability to download customer data on a local machine,
- Need to anonymise PII,
and many more complications that you need to overcome. All in all, data in a telecom company is a very difficult thing to work with. It’s not necessarily about data complexity, rather about security, compliance, data source, storage, computational limitations and so on.
Adobe Analytics, which used to be known as Adobe Sitecatalyst or Omniture, is a very popular digital analytics tool used by telecom companies to collect and analyse their digital (web & app) ClickStream data.
This tool helps them understand what are the most popular web pages, what are the pages with the highest/lowest bounce rate (so that the page can be optimized), how to optimize conversion rates, and more.
Python – Numpy, Pandas, Scikit-Learn, Matplotlib, Plotly
The Python data science stack is everywhere, and almost all teams use it, telecom is no exception.
Pandas, which is built on Numpy, is the go-to data analysis & data manipulation tool for many Python-based developers. Pandas coupled with Seaborn (which is built on Matplotlib) is used to perform exploratory data analysis.
Scikit-learn is definitely the go-to package/framework for machine learning. While I’ve mentioned machine learning here loosely, it can cover anything from building a baseline ML model, to hyperparameter tuning, and model deployment.
These days, interpretable machine learning (which is also known as “explainable AI”) is getting traction. It helps explain the so-called “black box” models to business stakeholders.
It’s quite common to jump into interactive visualizations: sometimes on standalone EDA in Jupyter Notebooks, but other times as part of a dashboard. Thanks to Plotly and Plotly Dash, both interactive visualization, and building a simple dashboard on top, is quite easy to do in Python.
Note that the Plotly Dash dashboards/visualization may not be the final content that the leadership team or higher executives want to receive. They might prefer something like Tableau due to their familiarity.
So far, I haven’t mentioned anything related to deep learning. The fact is, deep learning hardly ever happened in my experience. The areas where it happens are network analytics, customer support, and POC.
You can count the number of deep learning projects in production on your fingers (eg: chatbot, anomaly detection, forecasting, computer vision).
But that doesn’t mean quality ML problems aren’t solved. For tabular data, frameworks like xgboost, lightgbm, catboost are quite handy, given that accuracy is just one of the success metrics (alongside explainability, maintenance, deployment) in such a data science project.
R + R Shiny
The R programming language (coupled with RStudio as an IDE) is a popular solution that a lot of data science teams use. It’s specifically preferred over Python for time series forecasting (thanks to the versatile `forecast` package) – like forecasting NPS/CSAT/Churn for the upcoming quarter or financial year.
R is also preferred to perform statistical tests while running experiments (A/B Testing). Shiny is an R package that lets you create web applications and dashboards based only on R, and it’s usually preferred for rapid prototyping or model tuning with business stakeholders.
Given that Shiny Server for Linux is available for free (unlike Tableau Server), it’s a cost-effective option for teams to try out.
Tableau is one of the most preferred ways to share insights from data science analysis. Tableau is preferred because it’s quite easy for non-technical executives to play around with, and regularly track some of their KPIs.
Data science teams also adopt Tableau as a platform to deliver insights because of the availability of Tableau developers in the market, and ability for data scientists to get started with Tableau quickly.
What surprised me the most when I started working in telecom, and the biggest challenges
Information/Data Security is treated as the holy grail in telecom. If you’re going to work with anything related to data in telecom, you should be willing to bear the pain of waiting for data, signing InfoSec approval forms, following up with the Data Privacy Officer (DPO), and treating the data with utmost care.
Even the leak of seemingly trivial data could lead to a huge disaster for your company. On top of all this, adhering to GDPR is another complication to data science and machine learning projects.
Single Source of Truth (SSOT)
As much as it’s hard to collect data due to security policies, it’s equally hard to combine multiple data sources and create one single source of truth. Why?
It’s primarily because the telecom industry still uses a lot of legacy systems and a lot of data residing in, or coming from legacy systems is very different from the data you get from modern digital systems.
For example, there’s a big difference between a legacy point-of-sale system versus the latest online payment or recharge system. You’d need middleware to bridge a couple of those systems.
All in all, the more data sources, the more complicated creating a Single Source of Truth becomes.
In this post, we explored what it’s like to be a data scientist in a telecom company’s customer experience team.
We also covered relevant data science & machine learning use cases. We went through tools, frameworks and tech stacks.
But the most important thing were the challenges and surprises that you will face if you become a data scientist in telecom.
This was a short summary of my personal experience, and I hope it was useful for you to see what it’s like to be a data scientist in telecom.
MLOps: What It Is, Why it Matters, and How To Implement It (from a Data Scientist Perspective)
13 mins read | Prince Canuma | Posted January 14, 2021
According to techjury, we have produced 10x more data in 2020 compared to 2019. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed.
But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:
- acquiring & cleaning large amounts of data;
- setting up tracking and versioning for experiments and model training runs;
- setting up the deployment and monitoring pipelines for the models that do get to production.
And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.
There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps’ solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.
That brings us to MLOps. It was born at the intersection of DevOps, Data Engineering, and Machine Learning, and it’s a similar concept to DevOps, but the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.
Let’s dig in!Continue reading ->