[Automated Translation!] With every Google update, the rankings for some pages are swirled upside down, and from time to time one wonders why some pages are hit and others are not. Because partly pages are „punished“, with which one asks oneself the question, how can this be? That is actually a good side, isn’t it?
It is well known that Google uses machine learning to optimize relevance calculation algorithms. But how exactly does it work? And what does that mean for search engine optimizers?
How Machine Learning works
Imagine we have a lot of documents, and for each document properties have been measured. One property, for example, is the number of words in a document (X1), another is a measure such as the simplified PageRank of the domain on which the document resides (X2). These are now really fictitious values, and it should not be stated under any circumstances that there is a correlation here. It is only a matter of clarification.
First the values are brought to the same scale, then a distance matrix is created:
In the distance matrix, the distances between the individual rows are shown. Thus the distance from row 1 to row 3 is smaller than the distance from row 1 to row 4. If you look at the values, these distance calculations are comprehensible. In the next step, clusters are formed and plotted in a dendrogram:
Here, too, it is easy to understand why the values from rows 7 and 10 belong together rather than the values from rows 1 and 3. The machine was able to calculate these clusters from the distances alone.
What does Machine Learning have to do with Google’s Human Quality Raters?
Now we go one step further. We know that Google lets people judge search results, from highest to lowest, etc. The Rater Guidelines are easy to find. Again, distances play a vital role as soon as „highest“ gets a number and „lowest“ and all values in between.
Of course the Human Quality Rater can not see through all search results. Instead, certain „regions“ are trained, which means that the ratings are used to optimize the algorithm for certain search queries or signal constellations. Unlike in the previous example, we are dealing with supervised learning because we have a target variable, the rating. If we now assume that more than 200 factors are used for the ranking, we could formulate the task for the algorithm in such a way that it has to adapt all these factors so that it comes to the target rating.
To get a better understanding of how this works, let’s take another very simplified example, this time from a Support Vector Machine.
The principle of the Support Vector Machines is a simple but quite sophisticated approach to calculate the optimal distance between two different segments. Let’s take the red line in the picture above. It cuts the blue and the green circles. But it could just as well be rotated a few degrees to the left or right, and it would still perfectly separate the two segments. And now comes the trick: In order to calculate the optimal separation, the line is simply extended by two parallel lines. And the angle at which the two parallel lines are widest or furthest apart is the optimal angle for the red line.
Let us now assume that the two segments are again signals from the ranking, x1 is the PageRank, x2 the PageSpeed. The data here is plotted in a two-dimensional space, and you can see beautifully that they are wonderfully separated from each other. So we could train our machine on this data and then in the future, when new elements come into space, say that they should be classified based on what we have learned. And this doesn’t only work with 2 variables, but also with many. The space between the points is then called Hyperplane.
Now data are not always so exactly separable. Let’s take the example with PageRank and PageSpeed. Just because a page has a high PageRank does not mean that it must have a super speed. It could also happen in the picture above that some green circles are in the blue ones and vice versa. How can a separating bar be calculated through the segments? Quite simple: For every circle that is not clearly on „its“ side, there is a minus point. And now it is simply calculated, at which bar and its position the least minus points come about. This is called „Loss Function“. To put it another way: Even „good“ pages could be classified as „bad“ after a Support Vector Machine, the trick is to classify as few good pages as possible as bad and vice versa. It is unlikely that all „good“ pages have the same properties.
What does this mean for search engine optimizers?
First of all, what I said over a year ago at the SEO Campixx conference means that there is no static weighting; the ranking is dynamic. At Ask.com we had trained individual regions, for example if there were no backlinks or little text or health search queries, etc. No one size fits all. Today we do not have all 200 signals at our disposal to re-engineer the ranking per search term.
At the same time, however, it also becomes clear why sometimes pages are punished that actually would not have deserved it. It is not because they were found to be bad, but because they have too many signals, which speak for a worse ranking. And since the rater didn’t consciously search for any signals, the algorithm, be it Support Vector Machines or something else, chose the signals that mean a minimal loss. And since we don’t have all 200 signals, it is often impossible for us to understand exactly what it might have been. When re-engineering, one can only hope that there is something useful among the available signals.
This makes it all the more important to deal with the Quality Rater Guidelines. What do the Rater Expertise, Trust and Authority have in mind? What leads to the „highest“ rating? Even if it is boring, there is probably no better tip than the hygiene factors.
Support Vector Machines were developed in the 1960s. When there was no talk of data science. The ranking SVMs are also interesting in this context.