Machine Learning - Part Ten

4. Neural Networks and Deep Learning

"We have a thousand-layer network, dozens of video cards, but still no idea what to use it for. Let's generate cat pictures!"

Currently used for:

Replacing all the above algorithms

Object recognition in images and videos

Speech recognition and synthesis

Image processing, style transfer

Machine translation

Well-known architectures: Perceptron, Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN), Autoencoders

Essentially, a neural network is a set of neurons and connections between them. A neuron is a function with a set of inputs and one output. Its job is to take all the numbers from its inputs, perform a function on them, and send the result to the output.

Here's an example of a simple yet useful neuron in real life: sum all the numbers from the inputs and if this sum is greater than N - give the result 1. Otherwise - zero.

Connections are like channels between neurons. They connect the outputs of one neuron with the inputs of others so they can send a number to each other. Each connection has only one parameter - weight. It's like a connection strength for a signal. When the number 10 passes through a connection with a weight of 0.5, it becomes 5.

These weights tell the neuron to respond more to one input and less to another. During training, the weights are adjusted - this is how the network learns. Basically, that's all there is to it.

To prevent the network from collapsing into chaos, neurons are interconnected by layers, not randomly. In one layer, neurons are not connected to each other, but to neurons of the next and previous layers. Data moves in one direction in the network - from the inputs of the first layer to the output of the last.

If you throw in enough layers and correctly set the weights, you get the following: using the input, for example, an image of a handwritten digit 4, the black pixels activate the associated neurons, activate the next layers, and so on and so forth until finally, the output responsible for four lights up. The result is obtained.

In real-life programming, nobody is writing neurons and connections. Instead, everything is represented as matrices and calculated based on matrix multiplication for better performance. My favorite video about this and its sequel below explains the whole process in an easily digestible way using the example of handwritten digit recognition. If you want to delve into this topic, watch them.

After creating the network, our task is to assign appropriate weights so that the neurons react correctly to the received signals. Now it's time to recall that we have data that is an example of the correct "input" and "output". We show our network a drawing of the same digit 4 and tell it "adjust your weights so that whenever you see this input, your output becomes 4.

After hundreds of thousands of cycles of such "inferential punishment", there is hope that the weights will be corrected and work as intended. The scientific name for this approach is Backpropagation or the "backpropagation of error" method. The funny thing is it took twenty years to get to this method. Before that we were somehow still training neural networks.

A well-trained neural network can fake the work of each of the algorithms described in this chapter (and repeatedly does so with greater accuracy). This universality is what has made them so popular. Finally, we have an architecture of the human brain they said that we just need to assemble many layers and teach them probable data. Then the first AI winter came, then it melted, and then another wave of disappointment hit.

It turned out that networks with a large number of layers required unimaginable computing power at that time. Today, any gamer PC with Geforces is better than the data centers of that time. So people had no hope of getting such computing power and neural networks were a big letdown.

And then ten years ago, deep learning rose.

In 2012, convolutional neural networks achieved a landslide victory in the ImageNet competition, causing the world to suddenly remember the deep learning methods described in the ancient 90s.

The difference between deep learning and classical neural networks was in the new training methods that could handle larger networks. Today, only theorists try to divide what learning is deep and not so deep. And we, as practitioners, use deep learning libraries like "Keras", "TensorFlow", "PyTorch" even when we want to build a small network with five layers. Just because it's better than all the tools that came before. And we just call them neural networks.

Part Eleven