You will click this link. It is on the top.

It doesn't matter what we say, players will just click the links on top.

What if links are clicked just because they appear on the top?

On the previous page, we introduced a few factors that can be used to characterise the importance of a page. Now we propose a new, simpler factor:

What if Wikispeedia players just click the first link they see?

Next

Link Position

Now let's analyse the data to see if players are indeed influenced by the position of the link on the page in Wikispeedia games. Let us compare it against the link which is along to the shortest path to the destination.

This graph shows the link position on a page. The distribution of the link that leads to the shortest path is roughly uniform, but the distribution of the links that the players actually clicked on is heavily biased towards the top of the page.

So we're done! ... right?

Unfortunately not. We need to carefully choose a control group in order to isolate the effect of link position. Ideally our control group should act similar to human intuition while choosing links. Luckily, we do have a predictor for semantic similarity: Wikipedia graph embeddings! Given a page and a target destination, we can use cosine similarity to calculate the page that is closest to the target destination in the embedding space. This gives us our control group.

Confounders

However, there are still confounding factors that can affect the probability of a player clicking a link, and we need to account for them in order to isolate the effect of link position. In the previous page, we identified Pagerank and Category as two factors that greatly affect the link structure of Wikispeedia.

Where are the links on the page?

The following graphs show that the distribution of the link position on the page differs greatly between categories.

The splotch plot below helps us visualise the correlation between the Pagerank and the position of the link on the page for each category.

Think of each plot as a Wikipedia page, where the links in the page get more important as you go from left to right. Going down within a plot is akin to scrolling down a page. The shading of the plot represents how often a link to a page of that category appears at that position with such a level of importance.

Some categories do exhibit a tendency towards a corner. For example, Science and History pages with greater importance are more likely to appear at the top of the page.

The key takeaway for the purposes of our analysis is that there is some correlation between Category, Pagerank and the position of the link on the page.

Unconfounding the Confounders

We can attenuate their effect by calculating a propensity score for each page, which is the probability that a player will click on the page given the values of the confounding factors. Afterwards, we can match each clicked link to the link in the control group with the closest propensity score. This greatly increases the power of our statistical analysis.

Let's put it to the test!

After controlling for propensity, we see that the control group shifted to slightly favour links at the top of the page. Running an independent t-test on the control and treatment groups, we get a p-value close to zero (p ~ 1e-12) even after balancing the groups.

This indicates that players are indeed influenced by the position of the link on the page.

Conclusion

In this Data Story we have provided a temporal analysis on the link structure of Wikipedia from 2007 to 2022. In addition we have shown that players are indeed more likely to click a link just because it appears higher in the page. This is a good indication that the link structure of a page can influence the user's behaviour. Armed with this knowledge the user-experience of Wikipedia could potentially be improved, or perhaps user-behavior could be manipulated.

Thank you for reading our data story!