In the past few years, one would have to be a hermit in order to not have heard of strange concepts such as Machine Learning (ML), Artificial Intelligence (AI) or Deep Learning. Even though most people have some kind of idea about, at least, Artificial Intelligence, there are several misconceptions about the field and its current capabilities, and about the future of not only AI, but of how it will change human interactions.
Although there is increasing interest in AI, it is not, by any means, an easy task to learn it. There are countless resources available online, but many of them assume that viewers have past experience with mathematics and statistics, or are intended for researchers willing to learn mathematical formulations in depth. The resources for people that just want to know what is going on are few and far between.
That said, the objective of this document is to present some basic concepts of AI and Machine Learning to people with less technical background. It is not intended to be another AI guidebook, with detailed explanations or tutorials for implementation. It is my belief that, even though this document lacks depth, it still will prove useful in introducing the concepts of Machine Learning to new people, without all the overwhelming technicalities.
The ultimate goal of this document is to increase Machine Learning awareness, allowing people with different backgrounds to engage in discussions about possibilities and ideas in this field. This would also help people identify others’ pains and suggest AI solutions that would not be visible otherwise.
Firstly, it is important to distinguish Artificial Intelligence from other trending terms such as Machine Learning and Deep Learning. Putting it in a very simple manner, Machine Learning is a subfield of AI, in the same way Deep Learning is a subfield of Machine Learning. It might become clearer in the following picture, stolen shamelessly from the internet:
A technically more correct explanation would be that Artificial Intelligence is the field in Computer Science that studies ways to simulate humanlike intelligence. It has a number of subfields, such as Computer Vision, Natural Language Processing (NLP) and Machine Learning.
Machine Learning is the subfield from AI that aims to automate the decision making process of a machine given a set of data. There are several algorithms for this and they can be classified as:
- Supervised learning: algorithms that give machines both data and their expected labels. After the training, the model is supposed to correctly classify data outside of the training set. Examples of supervised learning algorithms are decision trees, linear regression, naive bayes, support vector machines and K-nearest neighbors;
- Unsupervised learning: algorithms where the machine is expected to detect trends in data without any labeling given beforehand. They are useful to identify clusters in data. Some examples of unsupervised learning algorithms are K-means and principal component analysis;
- Reinforcement learning: algorithms that teach the model by giving positive reinforcements for expected results and/or negative reinforcements for undesired behaviors. This is similar to teaching tricks to a dog, where it will learn them if it expects to receive treats;
As mentioned before, Deep Learning is a subfield of ML. A technical definition of Deep Learning is that it is a neural network consisting of several layers, used to achieve better results — which, more often than not, is the same as having improved data classification accuracy. Deep Learning has become more popular in the past few years due to increased availability of data and increased processing power of computers, especially given the availability of graphical processors for more general-purpose uses.
In the previous section, some basic concepts of Machine Learning were introduced. In this section, we will present other concepts related to the development of an ML-powered solution.
Firstly, what is the result of a model training session? It is something called model, an API (Application Programming Interface), if you will, that can be queried for the result it was trained for. Depending on the algorithm used, the model can be considered a black box model or white box model.
A white box model can be easily analyzed to determine why it has given a certain result for a given input. For example, we can verify the conditions of a decision tree or check the linear function resulted from a linear regression and conclude the exact reason for a result.
It is not as easy when the model is classified as black box. The result from a neural network or a support vector machine is very difficult to be explained. It is probably easier to just believe the results are correct. That said, black box algorithms tend to have very good results and are often used when it is not important to determine the reason a decision is made.
Another important factor to consider is the data set used to both train and validate a model. A common restriction is the availability of the data, especially for supervised algorithms, as human interaction is often required to annotate the expected results. However, in order to generate a precise model, it is often necessary to provide a good amount of data and the data used for training should not be used to test the precision of a model, as it has already seen it.
Balancing these problems can be a challenging concern and cross validation is used to help with it. Cross validation consists in separating the available data in different buckets with similar sizes, then performing both training and validation several times, using a different bucket as a validation set in each iteration. Therefore, the training will be executed with different sets of data and several results will be available. The final precision of the model will be the average of the results.
Another important consideration regarding precision is: should I always feed my model more data, if it is available? There are some important concepts related to this question. First, once a model has been trained, depending on the algorithm used to train it, it will not accept new training data. In other words, it will not learn new things.
Another point is that training a model is not a trivial task. Sometimes, it is necessary to select specific pieces of data from the data set that emphasize some features relevant to the algorithm, in order to increase its precision. It is definitely possible that adding more data to the training set will just “confuse” the model, creating in an undesirable result.
Finally, models trained with more data tend to be larger and more complex. Although it might not matter to general-use applications, if your model will run on a mobile device, it certainly can become a point of concern. For example, if you have an application that validates images taken from a cell phone camera, it might not be a good idea to have an overly complex model, as it would increase the size of the app.
In general, there are two major points of attention when training a model. It is important that the model is able to see enough data to be able to generalize the important data of the problem. When the model has not seen enough data, it may create erroneous assumptions about the data, a problem called bias.
On the other hand, the model should not be overly specific about the details of the data it has seen — in other words, it should not memorize the data — and be unable to predict the results of a different set of data. When the model has become extremely specific about the dataset used to train it, we say that it has become an overfitted model.
As mentioned in previous sections, in the past few years, the amount of available data has skyrocketed due to the increased popularity of internet and data analysis. In addition to that, computers became able to process more information, with the availability of GPUs (Graphics Processing Units) for more general purposes. In specific, the Machine Learning training procedures could be performed in GPUs with increased speed.
Given these circumstances, one family of algorithms in special thrived the most: neural networks. In the past, some of the drawbacks of neural networks were the high amount of data needed to get good results and the slow training speed. The recent events helped to mitigate those drawbacks, making it possible to develop neural networks with several layers, hence the name Deep Learning.
Deep Learning itself is not a technique, but a neural network with several layers. More interestingly yet, there are some specializations to it. Some of them had very promising results: Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN).
Convolutional Neural Networks
Firstly, what does this fancy name, Convolutional Neural Networks (CNNs), mean? A neural network is a Machine Learning algorithm. Convolution is a very technical name for a “sliding window”, a small window that processes the data it sees and outputs a summarized value. In this context, the “sliding” part refers to the fact the window is usually much smaller than the data it sees, and that it must go through all of the data in sequence.
Although it might sound like a novelty, CCNs have been used for a long time to create image filters. In general, image filters are windows that look at the neighborhood of an image pixel and replace the current value with a different value according to the value of the pixel and its neighbors.
In the ML context, convolution is used to emphasize specific features of images, such as borders or specific shapes. In a similar fashion to the image filters, the convolution operation is used to look at the data from a pixel and its neighbors and generate an aggregated value that still significantly represents the data. When the operation is complete, the result will be an image resembling the original, but with improved contrast on some features.
It is not unusual for a neural network to have several convolutional layers. In these cases, it is common for them to be intersect with different layers to reduce dimensionality, improving overall results.
As the previous explanation might have spoiled, CNNs are really useful for image applications. The simplest ones are used for labeling letters in a text, while more advanced ones can be used to tag people in photos on social networks or to apply image filters in social apps.
Generative Adversarial Networks
If a single neural network is able to get very promising results, why not combine multiple neural networks in order to get even better results? This is the premise of Generative Adversarial Networks (GANs). A GAN basically consists of a pair of models, one trying to achieve a result that would be considered good enough for a human (the generator) and another who judges whether a human would be fooled by it (the discriminator).
A way to see this concept is as a competition between a forger and a detective. The forger tries to fool the officer and learns more every time it is caught. On the other hand, the detective tries to detect forgeries and learns every time it is fooled. The result is an AI that is able to generate really impressive results.
GANs are used, mostly, when it is necessary to generate data based on other kinds of data. For example, it is possible to generate animal pictures based on just descriptions. Other impressive results are pictures of people that do not exist, but seem indistinguishable from real people at first glance.
Some of the downsides of GANs are that they are painfully slow and notoriously difficult to train. It is necessary to balance the skill of the generator and of the discriminator during the training process. A generator that is too good would be able to exploit the weaknesses of the discriminator, making it impossible for the discriminator to learn what to expect. Likewise, a skilled discriminator would have extremely good confidence levels on its decisions, making it impossible for the generator to find what paths it could follow to improve.
Although it is undeniable that the AI field has achieved very impressive results in the past years, it has by no means reached its peak. There are several challenges ahead, not only from a technical standpoint, but also from ethical and legal angles.
For instance, there is a problem, known for a long time, related to ML applications for image recognition. Usually, image datasets tend to not have any kind of noise. Therefore, it should not be surprising that models tend to mislabel noisy data. However, considering that in many cases the noisy data is very similar to a pure image, the results seem to be disappointing, especially from a layman standpoint.
Another concern regarding noisy data is that more modern models are still unable to consistently generalize data. In a recent study, American researchers analyzed the results for a popular dataset of natural images. By discarding the images that most models could predict successfully beforehand and leaving just the ones that were not reliably predicted, they found out that minor obstacles such as weather variation, texture differences or even pictures taken at different angles were enough to hinder the correct labeling.
This problem can also be extended to a more general case. Most models are trained considering only the happy path, where only appropriate data is used to train and validate the model. These models may behave in an unpredictable way if they are fed corrupted data, and the results could be outrageous mistakes. As ML applications become more available to end users, there should be more concerns regarding the security of the models.
A fairly recent example of this concern is Tay, a Twitter bot developed by Microsoft. It was available to the general public and it could adapt itself in an attempt to be able to engage in conversations in a more meaningful way. However, at the end of the first day, it had published several racist messages, due to the lack of care during the training period and the exploitations of end users.
In short, while ML has achieved many astounding results recently, it is by no means a completely solved problem. There are still many different areas for improvement, related not only to the correctness of the model, but also to requirements not obvious at first glance, such as security.
Another area of concern is related to applications developed with ML. The impressive results in the image generation field make it hard to distinguish real images and videos from ones generated with ML. This introduced the problem of deepfakes, where people’s faces are replaced in photos and videos.
Finally, in some applications, it might be important to be able to explain how a decision was made. For example, if a bank refuses to concede a loan, it should have a very good reasoning for the decision, and it should be possible to audit that motivation. This is not always possible, depending on the algorithm used to train a model.
Recently, this right to explanation has become more important due to the European Union General Data Protection Regulation, which states that “[the data subject should have] the right … to obtain an explanation of the decision reached”. Although this decision took effect in 2018, the impacts on the area are still not clear.
These points make it clear that, while ML is a powerful enabler for a multitude of applications, it should be regulated to some extent. Due to the novelty of many of the applications and the impressive results obtained with them, the general public may be unaware of potential misuse of information, in the sense of privacy or potential manipulation.