Project Description

Machine Learning

The domains of Big Data and Machine Learning are very closely related and have become more interwoven in recent years. Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data.

Machine Learning (ML) aims to ‘teach’ computers to perform certain operations (by running machine learning algorithms), so that the computer is able to make improved decisions in the future and can ‘learn’ from previous situations. ML is widely used for the purpose of data mining, which is the subject of shifting through large amounts of data to find unknown or hidden patterns.

At the highest level, ML can be classified into two different classes as shown below:

  1. Supervised Machine Learning
  2. Unsupervised Machine Learning

The main machine learning domains

Figure 1: The main machine learning domains

Supervised Machine Learning

In supervised machine learning, a computer learns a certain task because it is fed labeled training data. In other words, the computer is first confronted with a number of ‘sample cases’, from which it learns what decision to make. When new data than enters the system, the system subsequently ‘knows’ what decision to make. For this reason, supervised machine learning is mostly associated with classification and regression techniques.

An example of supervised machine learning is the sorting of email messages into a spam folder or as regular mail. The computer first needs to ‘learn’ which type of messages should be considered spam, by feeding a set of training data. After the computer ‘understands’ this training set and derived certain rules from it, it can classify future emails by itself.

Unsupervised Machine Learning

Here, a computer is fed data and needs to interrelationships in the data, without any prior knowledge about the data set. Any set of data can be fed into the computer, after which the machine will try to find certain patterns and relationships within the data. Unsupervised machine learning is therefore ideal for the purpose of data mining. The techniques associated with machine learning are clustering and correlation.

An example of unsupervised machine learning would be to feed large amount of insurance claims into a computer. Based on unsupervised learning algorithms, the computer might find that certain claims do not fit within a regular pattern and therefore might be fraudulent. These outliers would then need to be evaluated and validated by insurance agents.

Although it is technically not necessary to have ‘big’ data sets in order to perform machine learning operations, much value can be generated when the two are paired. In the rest of this guide, we will therefore consider machine learning in the context of Big Data.

To learn more about Big Data, visit our Big Data Knowledge Base. For more information, contact us at or drop us a message in the chatbox.