Machine Learning - Parts Five and Six

Regression

"Draw a line through these points. Yes, this machine is learning."

Today, this is used for:

Stock price prediction

Analyzing demand and sales volume

Medical diagnostics

Any kind of correlation

Popular algorithms are linear and polynomial regression.

Regression is essentially classification where instead of classes we predict numbers. Examples are car price by its mileage, traffic throughout the day, demand volume by company growth, etc. Regression is suitable when something depends on time.

Anyone who works with finance and analytics loves regression. And inside it is very smooth – the machine simply tries to draw a line that shows the average correlation. Although unlike a person with a pen and whiteboard, the machine does it with mathematical accuracy, calculating the average distance at each point.

When the line is straight – it’s a linear regression, when it’s curved – polynomial. These are the two main types of regression. Others are more exotic. Logistic regression is a black sheep in the flock. Don’t let it fool you, as it’s a classification method, not regression.

It’s okay to get regression and classification mixed up. Many classifications after some adjustments turn into regression. We can not only define the class of the object but also remember how close it is. Here comes a regression.

Unsupervised Learning

Unsupervised learning was invented a bit later, in the 90s. It is used less often, but sometimes we have no choice.

Labeled data is a luxury. But what if I want to create a bus classification? Should I manually take pictures of millions of buses on the street and label each one? No way, this will take a lifetime.

You can try using unsupervised learning. It is usually useful for exploratory data analysis but not as the main algorithm. A trained human brain with an Oxford degree feeds the machine with a ton of garbage and watches it. Are there any clusters? Visible relationships? Nope, well then move on. You want to work in data science, right?

Clustering

"Divides objects based on unknown features. The machine chooses the best way."

Used today:

To segment the market (customer types, loyalty)
To merge close points on the map
For image compression
To analyze and label new data
To detect abnormal behavior

Popular algorithms: K-means clustering, mean-shift, DBSCAN

Clustering is classification without predefined classes. The clustering algorithm tries to find similar objects (by some features) and merge them into a cluster. Those who have many similar features fall into one class. With some algorithms, you can even specify the exact number of clusters you want.

A great example of clustering markers on web maps. When you are looking for all restaurants around, the engine clusters them into a number of bubbles. Otherwise, your browser will freeze, trying to draw all three million restaurants in the city center.

Apple Photos and Google Photos use more sophisticated clustering. They look for faces in photos to create albums of your friends. The app doesn’t know how many friends you have and how they look, but it tries to find common facial features. Ordinary clustering.

Another popular issue is image compression. When saving an image to PNG, you can set the palette, say, 32 colors. This means that the clustering will find "red" pixels, calculate the "average red", and set it for all red pixels. Fewer colors – lower file size – profit!

However, it might have problems with colors like Cyan◼︎. Is it green or blue? Here comes the K-Means algorithm.

Randomly sets 32 color points in the palette. Now, these guys are centroids. The remaining points are assigned to the nearest centroid. So, we get these kinds of galaxies around 32 colors. Then we move the centroid to the center of its galaxy and repeat this until the centroids stop moving.

Done. Clusters are defined, stable, and there are exactly 32 of them. Here is a real-world explanation:

Finding centroids is easy. Although, in real life, clusters are not always round. Let’s imagine you are a geologist. And you need to find some similar minerals on the map. In this case, clusters can be oddly shaped and even nested. Also, you don’t even know how many of them to expect.

K-means is not suitable here, but DBSCAN can help. Let’s say our points are people in a city square. Find any three people standing next to each other and ask them to hold hands. Then tell them to start holding hands with those neighbors they can reach. And so on until no one can hold anyone's hand anymore. This is our first cluster. Repeat this process until everyone is clustered.

All this sounds interesting:

Just like classification, clustering can be used to detect anomalies. The user behaves abnormally after registration? Let the machine temporarily ban him and create a ticket for support to check him. Maybe it’s a bot. We don’t even need to know what "normal behavior" is, we just upload all user actions to our model and let the machine decide whether this user is "normal" or not.

This method doesn’t work well compared to classification, but it never hurts to try.

Part Seven