We inaugurate the section dedicated to machine learning, with an introductory roundup on the state of the art that we have reached today in the field of artificial intelligence.

In particular, in this article we are going to see what are:

  • the general concepts of machine learning;
  • the types of learning and the basic terminology used;
  • the main blocks that characterize the development of machine learning systems.

What is Machine Learning?

Machine learning was born in the mid-twentieth century as a branch of artificial intelligence.

Determinant for its evolution is the large amount of data (structured and unstructured) that we are increasingly able to retrieve thanks to the growing computerization.

Instead of leaving humans with the burden of manually deriving the rules that determine the outcomes of an event, machine learning offers a much more efficient alternative by capturing the information hidden in the data.

The idea is precisely to learn from the data, gradually correcting its predictive model so as to indirectly understand the laws that govern a given phenomenon.

Virtual assistants like Alexa or Siri, spam filters like those of Gmail, computers that beat chess champions (and video games), are all practical examples of applied machine learning.

How many types of Machine Learning are there?

We can divide machine learning into 3 macro categories:

  • Supervised Learning
  • Reinforcement Learning
  • Unsupervised Learning

Supervised Learning

We speak of supervised learning when the data available for the process of training are labeled, that is to the data of departure is associated also the result.
This greatly simplifies the problem because, knowing for each input data its result, we can focus on learning what are the rules that determine the behavior.

The supervised learning is divided into two subcategories:

  • Classification
  • Regression


The objective of classification is to predict the class of the data we are presenting, starting from a group of categories observed during the learning phase.

To simplify things, imagine a classifier that has to tell us if a name (e.g. Laura, cat, chair) is masculine or feminine.
The starting categories, or classes are then these two types of gender.

Since we’re talking about a supervised classifier, we’ve certainly started with a database containing a fair amount of names associated with their associated category:


This data is used to train the classifier.
Once this step is completed, we will be able to determine, with some accuracy, for any other name its gender.

The figure I present below, illustrates the concept of binary classification.
Starting with an already classified dataset (blue – red), the task of a classification algorithm is to figure out what rule best separates the two categories, represented here by a dashed black line.

It is important to note that the class set does not have to be binary as in the example shown. It is also possible to use the same techniques on larger problems, such as character recognition.

In this case the starting database can be composed of a series of handwritten characters, cataloged by alphabetical letter.
Also here, once the training phase is over, our classifier will be able to recognize a written character with a certain accuracy.


When the data we are analyzing cannot be cataloged with predefined classes, we speak of regression.

A simple example can be the prediction of the temperature in a city starting from the measurements made in the last year as the seasons change. Cases like this cannot be traced back to simple specific classes (hot, cold) but need more complex outputs such as floating point numbers.

What you do during regression then is provide a number of variables associated with a continuous response and then try to find a relationship between these variables.
In the case I reported earlier, we will provide a number of measurements taken during the year and try to figure out if there is a relationship between the temperature in the chosen city and the succession of seasons.

In the figure below you can see an example: starting from a variable x and an answer y, the regression tries to approximate the trend of the data with a straight line.

If you’re interested in learning more about machine learning and the concepts we cover in these articles I highly recommend checking out this book by Sebastian Raschka.

The book covers, with great clarity, topics of classification, regression, neural networks, and then moves on to dimensionality reduction techniques, development libraries such as TensorFlow, and much more. The focus is on Python, a programming language that offers very powerful resources to develop applications that make use of these tools.

Reinforcement Learning

In reinforcement learning the goal is to develop a system, called an agent, capable of improving its performance by interacting with the environment around it.

It is precisely the environment that surrounds the agent that gives it a reward signal, which allows the agent to understand which is the best strategy to obtain the highest signal of reward. This signal is nothing more than the result of a cost function defined through the constraints of the problem we are trying to learn.

A very simple example of reinforcement learning is the game of chess. An intelligent system will try, on the basis of the rules defined to play, to make moves. A victory or a defeat will result.
Depending on the result, the agent will gradually be able to understand which moves are winning and which are not.

Unsupervised Learning

Unsupervised learning, as you may have guessed, is the opposite case of supervised learning.
The source data used to train our model is unlabelled.

The idea is that through these techniques it is possible to explore the structure of the data and to extract meaning information.

Generally, we can speak of two subcategories of unsupervised learning:

  • Clustering
  • Dimensionality reduction


Clustering techniques allow the grouping of data, called clusters, which share a certain degree of similarity.
Because of the way they work, these techniques are often called unsupervised classification, because at the end of the learning process they return a set of membership classes.

In the image below you can see a scenario in which the clustering algorithm returned three distinct groups with which the source data could be divided.

Dimensionality Reduction

When the starting data is very large, it can be difficult to manage it efficiently due to both space and performance issues of the applicable machine learning algorithms.

What is done is to apply dimensionality reduction techniques in order to compress the data and simplify the initial set.

Example of Dataset

At this point it is worth spending a few words to describe the characteristics of a dataset used to develop a learning system.

Let us take as an example one of the datasets contained in the UCI Machine Learning Repository: The Iris dataset contains the measurements of 150 flowers of three different species – Setosa, Versicolor and Virginica.

Each row of the dataset represents a sample, i.e. a specific flower that has been measured. The columns represent the features (or attributes) and as you can see from the image, several features have been considered for each flower.

The last column is populated by the classes to which they belong, in this case talking about a dataset for classification, we have three different types of possible labels.

The more populated a dataset is, the easier it is to achieve good results during the learning phases as the amount of data that can be used for learning is greater.

What is the workflow of a learning system?

So far we have been able to introduce the basic concepts of machine learning. But what are the essential pillars that give life to a learning system?

We can divide the phases of learning into four parts distributed in the following order:

  • Preprocessing: It is one of the most important in a learning process, since the result of this one will depend on all the others that follow. In this phase the objective is to optimise the data in order to make it as informative as possible. This is achieved, for example, by normalising them so that they are all represented with the same scale, reducing their dimensions so as to eliminate some features that may be redundant, etc.
  • Learning: The algorithm to be used is chosen and the newly prepared data is learned. It is always a good idea to divide the starting data into three main groups: training set, validation set and test set. The first is generally used to train the neural network. The second is used to evaluate the goodness of the model created and to determine its behaviour when presented with data never evaluated before. The third, finally, is used in the evaluation phase to validate the model.
  • Evaluation: Starting from the model created in the previous step, the test set is used to validate it and evaluate its performance.
  • Prediction: This is the final phase in which the learning system is actually released and used. It is during the prediction that new data are submitted to the network, using in fact what has been produced.