Blog » ML Tools » Most Popular Programming Languages & Why They’re Useful in Machine Learning

Most Popular Programming Languages & Why They’re Useful in Machine Learning

Online forums and data science blogs have plenty of recommendations: learn that language, keep up with that framework, have you looked at that library? It can be hard to keep up with all the latest technologies, especially in machine learning, a field that changes almost daily. 

In this article, we will take it back to the basics, and discuss the most useful programming languages for machine learning. In addition to going over a brief summary of how these languages work, we will cover the basics of machine learning, popularity trends among data scientists, useful libraries, use cases, and paradigms. Let’s dive in!

Basic overview of building a Machine Learning model 

Before learning why certain programming languages are better suited to ML, it is important to understand the basics of building a ML model. 

Machine learning is the closest thing to mimicking the human brain. ML algorithms search for patterns in swaths of data – images, numbers or words – in order to make predictions. Underneath the hood of search engines and content recommendation systems are these powerful machine learning algorithms. 

The are 4 basic steps when you build a machine learning model: 

Step 1: Putting together a dataset 

The dataset must reflect real life predictions that the model will make. Training data can be sorted into classes; the model will pick out different features and patterns between the classes to learn how to tell them apart. For example, if you train a model to sort zebras and giraffes, the dataset will have images of the two animals, appropriately labelled. 

You want to prepare an all inclusive dataset, so the model’s predictions won’t be inaccurate or biased. The dataset should be randomized, deduped, comprehensive, and split into training and testing sets. 

You use the training set to train the model, and the test set to determine the accuracy of the model and identify potential caveats and improvements. The training and test sets shouldn’t have overlapping data. 


MIGHT INTEREST YOU
Image Classification: Tips and Tricks From 13 Kaggle Competitions (+ Tons of References)


Image dataset example
Example of an image dataset that could be used for object detection | Source

Step 2: Choosing an appropriate algorithm 

Depending on the task at hand, amount of training data, and whether the data is labeled or unlabeled, you use a specific algorithm. 

Common algorithms for labeled data include: 

  • Regression algorithms 
  • Decision trees
  • Instance-based algorithms

Common algorithms for unlabeled data include: 

Neural network
A neural network acts as a human brain in order to recognize relationships within huge amounts of data. | Source
  • Association algorithms

READ LATER
Recurrent Neural Network Guide – a Deep Dive in RNN


Step 3: Training the algorithm on the dataset 

The model trains on the dataset over and over, adjusting weights and biases based on incorrect outputs. For example, if we have the equation for a straight line: y = mx + b, the adjustable values for training are ‘m’ and ‘b’, or our weights and biases. We cannot impact the input (y) or output (x), so we would have to play around with m (the slope) and b (y-intercept) During the training process, random values are assigned to m and b, until the position of the line is affected in such a way that it will have the most correct predictions. As it continues to iterate, the model’s accuracy keeps increasing. 

Step 4: Testing + improving the model 

You can check the model’s accuracy by testing, or evaluating, it on new data that has never been used for training before. This will help you understand how your ML model performs in the real world. After evaluation, you can fine tune hyperparameters that we may have originally assumed in the training process; adjusting these hyperparameters can become somewhat of an experimental process that varies depending on the specifics of your dataset, model, and training process. 

Applications of Machine Learning 

Now that we learned the theory behind constructing a simple machine learning model, let’s explore various applications in the real world. This is what you will be able to build once you learn machine learning!

Recommendation Engines

Almost every major platform has recommendation systems. They collect user data to suggest products, services, or information. Recommendation systems might track data such as the user’s viewing history, how long the user views something, how they react to it, etc. 


READ ALSO
How AI and ML Can Solve Business Problems in Tourism – Chatbots, Recommendation Systems, and Sentiment Analysis


Social media 

Social media platforms use a variety of ML tools to keep their uses hooked. For example, Facebook analyzes what content you like in order to provide relevant advertising. Instagram identifies visuals with image processing. Snapchat tracks your face movement while placing a filter over it using computer vision. 

Self-driving cars

In order to avoid nearby objects such as pedestrians and other cars, self-driving cars collect data from their surroundings via sensors and cameras. Using ML algorithms such as SIFT (scale-invariant feature transform), the cars can behave as if someone is actually behind the wheel. 

Education 

ML can be used to identify struggling students or gaps in educational platforms. Flashcard tools such as Anki and Quizlet use algorithms to track memorization and retention rates.

Medicine + Health 

ML technologies are becoming a prominent part of the healthcare industry. ML algorithms can be used to determine the best course of treatment, help with more accurate diagnosis and drug development, and much more. Healthcare administrative systems use ML to map and treat infectious diseases, and provide individualized care to patients.


BOOKMARK FOR LATER
Data Science and Machine Learning in the Medical Industry


People detection 

People detection lets you identify and track people through a camera or in a video. This technology is used in security devices, and most recently being implemented in the Amazon Go Grocery Store, which tracks customers as they shop so they don’t have to checkout. 

Programming languages for ML

You now have a basic understanding of ML and its real-world applications, but it might be difficult to know where or how to start. A great first step is getting to know at least one of the main programming languages used in machine learning.

Before we dive in, let’s talk about popularity among data scientists. The Developer Survey in 2018 conducted by Stack Overflow revealed that Python was the most popular programming language, followed by Java and Javascript. 

Python

Python has undisputed popularity. Guido Van Rossum, Python’s creator, says “I certainly didn’t set out to create a language that was intended for mass consumption.” Safe to say, he has done the opposite, as Python brings coding to the forefront for people who shy away from it – long gone is overly complex syntax. 

This graph that depicts popular Python use cases is another product of the Developer Survey. Initially, web development seems like the most popular use of Python, with about 26%. However, the combination of data analysis and machine learning reveals a staggering 27%. So why is Python so popular for ML? 

Before we dive in, we have to understand that AI projects are different from traditional software projects in terms of the technology stack and skills required. Therefore, choosing a programming language that is stable, flexible, and has a diverse set of tools is so important. Python meets all of these. 

In addition to simplicity and consistency, it has a great community that helps build a variety of ML frameworks and libraries. These are packages of pre-written Python code that help machine learning engineers quickly solve common tasks and develop products much quicker. 

Popular Library Specific Use Benefits
Keras Supports multiple back-end neural computation engines User friendly, easy to learn and build models, broad adoption
Tensorflow Large numerical computations/ neural networks with many layers Beautiful computational graphs, library management, debugging, scalability, pipelining
Scikit learn Tools for ML and statistical modeling Conduct various tasks including preprocessing, clustering, model selection, etc. Lots of features, and built for a variety of purposes
NumPy Work in domain of linear algebra, fourier transform, and matrices Reduces memory usage to store data
SciPy Built on top of the NumPy extension, lets user manipulate and visualize data High level commands and classes in order to visualize and manipulate data
Pandas Used for data analysis, cleaning, exploring, transforming, visualizing Variety of features, handles large data, streamlined forms of data representation
Seaborn Built on top of Matplotlib, create beautiful and informative statistical graphics Aesthetics and built in plots

As you can see, Python provides an easy experience for ML that no other language quite can: you don’t have to reinvent the wheel here. 

Python is more intuitive than other programming languages because it has dead-simple syntax and is the best option for team implementation. Developers can focus on the ML task at hand such as complicated algorithms or versatile workflows, rather than the specifics of the language. Just take a look at this simple example: 

C:

#include<stdio.h>
#include<conio.h>
main()
{
    printf("Hello World");
}

Java: 

public class HelloWorld {
   public static void main(String[] args) {
      System.out.println("Hello, World");
   }
}

Python is also incredibly flexible and platform-independent, as it’s supported by Linux, Windows, and macOS without the need for a Python interpreter. This also makes training a lot cheaper and simpler when using your own GPUs. 57% of machine learning engineers report using Python, and 33% of them prefer it for development. 

However, as for anything, we have to take into account the disadvantages of Python as well: 

  • Has little to none statistical model packages
  • Threading in Python is quite problematic because of the Global Interpreter Lock (GIL), and multi-threaded CPU-bound applications run slower than single-thread ones. 

CHECK ALSO
Data Augmentation in Python: Everything You Need to Know


R

Now let’s look into R. R was built for high-level statistics and data visualization. For anyone who wants to understand the mathematical computations involved in machine learning or statistics, this is the language for you. 

R beats in Python in terms of data analysis and visualization. It allows for rapid prototyping and work with datasets in order to build your ML models. For example, if you would like to break down huge paragraphs into words or phrases to look for patterns, R would beat Python. 

R also comes with an impressive collection of libraries and tools to help with your machine learning pursuits. These advanced data analysis packages cover both the pre- and post- modeling stages, and are made for specific tasks such as model validation or data visualization. 

Helpful packages and libraries include:

Popular Library Specific Use Benefits
Tidyr Collection of used to “tidy” your data and make it easy to work with More efficient code, organized data
Ggplot2 Part of Tidyr, data visualization package that breaks up graphs into semantic components Explore data easier, while creating complex visualizations
Dplyr Part of Tidyr, helps with data manipulation challenges Speedy, direct connections to external databases, chain functions to reduce clutter and coding, syntax simplicity
Tidyquant Used for business and financial analysis Model and scale financial analysis

In addition to an active and helpful open source community, R is free to download and comes with GNU packages, placing it among expensive alternatives such as SAS and Matlab. R Studio is an IDE that lets developers create statistical visualizations of ML algorithms. R comes with a console, syntax highlighting editors, and other useful tools such as plotting, history, debugging, workspace management. 

rstudio-windows
Screenshot of R Studio | Source

To get started with R and download R studio, check out its website.

Now let’s talk about the not-so-fun part. The disadvantages of R include: 

  • A steep learning curve: R is a hard language, potentially making it harder to find experts for a project or team. Any new package that you use will need learning, and there is no thorough documentation of R. 
  • R can be inconsistent, as its algorithms are from third parties. 

The big question is often R or Python when it comes to ML. Both of them have their advantages and disadvantages, but Python is better for data manipulation and repetitive tasks. If you would like to build some sort of product that uses ML, go with Python. If you need some in depth analysis, R is your best bet. 

Julia

We have talked quite in depth about the two behemoth languages for ML now. But Julia is a real underdog: although not as popular as Python and R, it was made to match the functionality of Python, MATLAB, and R, along with the execution speed of C++ and Java. Now that’s reason enough to keep it in mind! Java has two huge advantages: speed + designed for parallelism.  Because it feels like a scripting language, it’s also not difficult to switch to, so Python / R developers can pick it up easily. 

In terms of AI, Julia is best for deep learning (after Python), and is great for quickly executing basic math and science. Julia focuses on the scientific computing domain and is greatly suited for it. Because of these computing capabilities, Julia is scalable and faster than Python and R. 

Its powerful native tools include: 

Popular Library Specific Use Benefits
Flux Lightweight ML library, useful tools to help you use the full power of Julia Written in Julia, comes with same functionality as Tensorflow
Knet Deep learning framework that supports GPU operation and automatic differentiation using dynamic computational graphs for models in Julia Written in Julia, active community
MLBase.jl can be used for data processing & manipulation, performance evaluation, cross validation, model tuning, etc. Written in Julia
TensorFlow.jl Julia version of Tensorflow Written in Julia, provides flexibility to express computations as data flow graphs
ScikitLearn.jl Julia version of Scikit Learn Written in Julia, lets you carry out preprocessing, clustering, model selection, etc.: a variety of purposes

Julia can also call Python, C, and Fortran libraries, and comes with an interactive command line and a full-featured debugger. 

However, compared to Python, Julia lacks in terms of object oriented programming, scalability, community, and variety of libraries. It’s still in its infancy. Most ML experts use both: Julia for the backend deep learning, where it achieves the best performance rates, and Python for the front end. 

JavaScript

Javascript tensorflow
An example of what you can do with TensorFlow.js | Source

When you think of ML, JavaScript certainly isn’t the first in mind. Although JavaScript is primarily used for web development, it has slipped into machine learning with TensorFlow.js. TensorFlow.js is an open-source library created by Google that uses JavaScript to build machine learning models in the browser, or in Node.js, JavaScript. TensorFlow.js is a great entry into ML for those only familiar with web development. TensorFlow.js supports WebGL, so your ML models can operate when a GPU is present; for example, if users open the webpage on their phones, the model can utilize sensory data. Tensorflow.js lets you import existing, pre-trained models, retrain an imported model and create models within the browser. Let’s look at the pros and cons of TensorFlow.js


LEARN MORE
Check how to keep track of TensorFlow model training metadata with Neptune.


Pros Cons
Has a high computational performance Data limitations: cannot access the browser, meaning data resources are limited
Highly secure, devices stay protected against external threats when running an application Limited support for hardware acceleration
Several use cases: Javascript applications in browser, servers inside a Node.js environment, in desktop, mobile, etc. Single threaded, which can limit performance

A lot of developers are taking ML from back-end servers to front-end applications. TensorFlow.js allows developers to now create and run ML models in pure HTML without complicated backend systems. This simplicity lets you make great projects easily. Here are a few examples: 

  • Automatic Picture manipulation: generate art via convolutional neural networks
  • Games using Ai
  • Content recommendation engines
  • Activity monitoring that learns usage patterns on a local network/ device
  • Object detection, for example to identify a license in a photo

Scala 

Scala is significantly faster than Python, and brings the best of object oriented and functional programming to one high-level language. It was originally built for the Java Virtual Machine (JVM) and is very easy to interact with Java Code. Developers can easily build high-performance systems, while avoiding bugs through Scala’s use of static types. 

Scala has several libraries that work for linear algebra, random number generation, scientific computing, etc.: 

Popular Library Specific Use Benefits
Saddle used for data manipulation through array-backed support, 2D data structures, etc. Built on top of array-backed data structures
Aerosol a fast GPU and CPU-accelerate library Accelerates programing
Breeze Main scientific computing library, best from MATLAB’s data structures and Numpy classes from Python Fast and efficient manipulations with data arrays
Scalalab Scala version of MATLAB computing functionality Comes with scalability and power of Scala
NLP Used for natural language processing Can parse thousands of sentences because of high speed and GPU usage

Scala is also a great choice for Apache Spark in terms of performance, learning curve, and ease of use (Apache Spark is a data processing framework for processing tasks on giant data sets and distributing data processing tasks through several computers). 

Now let’s compare the pros and cons of Scala: 

Pros Cons
Allows for utilization of JVM libraries, frequently used in enterprise code Steep learning curve as combines both functional and object oriented programming
Several readable syntax features Limited developer community, resources
Functional features such as string comparison advancements, pattern matching, etc.

C/C++

C/C++ and machine learning is a difficult pairing. From the get go, it seems like Python has many advantages over C/C++:

  • Python is more flexible, has a simple syntax, and is easier to learn
  • Using Python lets you focus on the nuances of ML, rather than the language
  • Tons of libraries and packages
  • You can work interactively with data just through the command line via the Python interpreter
  • Debugging in C/C++ for ML algorithms is much harder

However, there are also advantages to using C/C++: 

  • C/C++ is one of the most efficient languages – and machine learning algorithms need to be fast. 
  • Using C/C++ lets you control single resources starting from memory, CPU,etc. 
  • A lot of ML frameworks such as TensorFlow, caffe, vowpal, wabbit, libsvm, etc. are actually implemented in C++ 
  • You will definitely stand out to recruiters and companies

As one of the oldest programming languages, C and C++ are a niche in terms of machine learning. C can be used to complement existing machine learning projects and computer hardware engineers prefer C due to its speed and level of control – you can implement algorithms from scratch using C/C++. 

Generally, use C/C++ when: 

  • Speed is extremely important
  • There isn’t a Python library for your use case
  • You want to control memory usage because you will be pushing your systems limit
Popular Library Specific Use Benefits
Tensorflow Large numerical computations/ neural networks with many layers Beautiful computational graphs, library management, debugging, scalability, pipelining
Microsoft Cognitive Toolkit a deep-learning toolkit that uses a directed graph to depict neural networks a series of computational steps Opensource, lets you use huge datasets
Caffe deep learning framework that let you work with expressive architecture, extensible code, etc. Increases speed
Mlpack Lets you implement ML algorithms faster Emphasis on scalability and speed, easy to use
DyNet (Dynamic Neural Network Toolkit) Neural network library that has support for NLP, graph structures, reinforcement learning, etc. High-performance, runs efficiently on CPU or GPU
Shogun Has various ML methods such as multiple data representations, algorithms classes, general purpose tools, etc. Open source, variety of tools

Java

Java language
Screenshot of the Weka Machine Learning Workbench | Source

Java is typically known for enterprise development and backend systems. However, there are several reasons to choose Java over Python or R:

  • A lot of companies’ infrastructure, software, applications, etc. are built with Java, meaning integration and compatibility issues are minimized
  • A lot of popular data science frameworks such as Fink, Hadoop, Hive, and Spark are written in Java
  • Java can be used for various processes in data science such as cleaning data, data importation and exportation, statistical analysis, deep learning, NLP, and data visualization
  • Java Virtual Machine lets developers write code that will be identical across multiple platforms, and also build tools much faster 
  • Applications built with Java are easy to scale with 
  • Java is speedy and fast executing like C/C++, which is why Linkedin, Facebook, and Twitter use Java for some of their ML needs
  • Java is a strong typing programming language, meaning developers must be explicit and specific about variables and types of data
  • Production codebases are often written in Java

Java also comes equipped with various tools and libraries for ML as well:

Popular Library Specific Use Benefits
Weka Used for general purpose machine learning: algorithms, data mining, data analysis, predictive modeling Open source, wide variety of tools
Apache mahout Used to create scalable machine learning algorithms Scalable, ready-to-use framework for large data mining tasks
Massive Online Analysis Open source software used for data mining on data streams in real time Real time analytics
Deeplearning4j Framework with wide support for deep learning algorithms Open source, Provides high processing capabilities
Mallet Specialized toolkit for natural language processing, can use for topic modeling, document classification, clustering, info extraction Built on Java, lets you use ML to process textual documents

Energy data for all the programming languages  

In 2017, Portugal conducted an interesting research investigation and released “Energy Efficiency Across Programming Languages.” They ran the solutions to 10 problems from the Computer Language Benchmarks Game, a standard set of algorithm problems in 27 different languages, while meticulously recording how much electricity, speed, and memory each used. The results are summarized in this chart: 

As expected, out of all the languages we discussed today, C turned out to be the fastest and most energy efficient. 

Comparison of all the programming languages  

Let’s take one final look at all the various features offered by these languages:

Python R Julia Javascript Scala C/C++ Java
General
Paradigm Object oriented, imperative, functional, aspect-oriented, reflective Mix of object oriented and functional programming, imperative, reflective Multiple dispatch, therefore easy to express object-oriented and functional programming patterns Imperative, object-oriented, functional, reflective Object oriented, functional, generic Imperative Imperative, object oriented, generic, reflective
Standardized No, as the language reference is included with each version’s documentation No Yes Yes Yes Yes
Type Safety Strong strong strong weak strong weak strong
Type Strength Safe safe safe safe unsafe safe
Expression of Types Implicit implicit implicit implicit Partially implicit explicit explicit
Type Compatibility Duck, structural Nominative, structural nominative nominative
Type checking dynamic dynamic dynamic dynamic static Static static
Parameter Passing By value Value by need “Pass-by-sharing” By value By value + name By value/ pointers By value
Garbage Collection Yes yes Yes Yes Yes optional yes
Intended/ Mainstream Use ML, Web dev., Game Dev., Software dev. Statistical computing and graphics, numerical computation, visualization Parallelism,

“We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell.”
Source
Web Scripting, Web + Mobile dev, web servers + game dev. General purpose language, Parallel Computing, DSL and computing System, embedded Application, server-side, back end development, Android dev.
Design Goals Productivity, code readability, simplicity, modularity Interactive, thorough analysis of datasets, advanced data analysis General purpose, thorough analysis of datasets, powerful and fast Web Dev. Better alternative to Java, concise, type safe, scalable, platform independent Low level access Simple, secure, distributed, object oriented, robust, portable “Write Once, Run Anywhere”
Machine Learning
*Source: State of the Developer Nation Q1 2017
Popularity: Prioritized by* 57% 31% N/A 28% N/A 43% 41%
Popularity: Used by* 33% 5% N/A 7% N/A 19% 16%
Popular Usage* Sentiment analysis, NLP/chatots, web mining Sentiment analysis, bioengineering/bioinformatics, fraud detection Scientific calculations Search Engines, web dev. Big Data AI in games, robot locomotion, network security + cyber attack detection Customer support management, network security + cyber attack detection + fraud detection
Professional Background before entering ML* Data science Data analyst/ statistician N/A Front end web developer N/A Embedded computing hardware, electronics engineer Frontend desktop application developer
Reason to get into ML* Curious about ML Data science Increase chances of securing work Add machine learning to existing apps Company
Community/ Support
Popular Libraries Keras, Tensorflow, Scikit learn, NumPy, SciPy, Pandas, Seaborn Tidyr, Ggplot2, Dplyr, Tidyquant Flux, Knet, MLBase.jl, TensorFlow.jl, ScikitLearn.jl TensorFlow.js Saddle, Aerosol, Breeze, Scalalab, NLP Tensorflow, Microsoft Cognitive Toolkit, Caffe, MLpack, DyNet, Shogun Weka, Apache mahout, Massive Online Analysis, Deeplearning4j, Mallet
Resources / Documentation / Community Great Great Starting out Could be better for ML Could be bigger good good

How is Neptune useful for ML? 

Neptune.ai | Metadata Store for MLOps

Experiments in ML let you test out different hypotheses – for example, will a certain algorithm outperform others? Experiments let you change variables, collect data, and modify your conclusions as you go. 

However, the experimentation process can be somewhat chaotic, as you need to track and manage countless variables and artifacts.Here’s just a basic list of things to keep in mind:

  • Parameters: hyperparameters, model architectures, training algorithms
  • Jobs: pre-processing job, training job, post-processing job — these consume other infrastructure resources such as compute, networking and storage
  • Artifacts: training scripts, dependencies, datasets, checkpoints, trained models
  • Metrics: training and evaluation accuracy, loss
  • Debug data: Weights, biases, gradients, losses, optimizer state
  • Metadata: experiment, trial and job names, job parameters (CPU, GPU and instance type), artifact locations (e.g. S3 bucket)

It would be a hassle to keep track of it all manually. Luckily, you can streamline and organize your experimentation process with Neptune:

  • Log metrics, hyperparameters, data version, hardware usage, and more
  • Monitor your experiments live
  • Filter, group, and compare experiment runs in an intuitive UI 
  • Collaborate with team and organize projects and experiments

Learning more about ML + Neptune

You can use this article as a starting point to decide which language to explore. As you can see, there isn’t “one best option,” rather learning multiple languages to complement each other and get the best of all worlds is your best bet. When you’re starting out your ML projects and experimenting, you can use Neptune as a central place for managing them and collaborating on them with your team. If you want to learn more about Neptune, check out the official documentation. If you want to try it out, create your account and start tracking your ML experiments.


READ NEXT

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Jakub Czakon | Posted November 26, 2020

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

  • use different models and model hyperparameters
  • use different training or evaluation data, 
  • run different code (including this small change that you wanted to test quickly)
  • run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics. 

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.  

This is where ML experiment tracking comes in. 

Continue reading ->
MLOps

MLOps: What It Is, Why it Matters, and How To Implement It (from a Data Scientist Perspective)

Read more

Best Tools to Manage Machine Learning Projects

Read more
Best tools featured

15 Best Tools for Tracking Machine Learning Experiments

Read more
RL examples

10 Real-Life Applications of Reinforcement Learning

Read more