Altius Europe Gets Eurovisionary

Altius Europe Gets Eurovisionary

In 2020 the Eurovision bandwagon will touch down in the Netherlands and Altius is jumping on board with a machine learning model to predict the 2020 Eurovision winner.

A 2019 victory for the Netherlands

Having won the 2019 contest, the Netherlands will host the 64th annual song contest in May 2020. ‘Open up’ has already been chosen as the 2020 theme and the 41 countries taking part have been announced. The Eurovision bandwagon is on a roll, but what remains to be seen is, who will win this year’s contest?

It’s a question that always attracts a lot of attention. Bookmakers predict odds; forums and official Eurovision fan clubs weigh in on the subject and social media goes into overdrive. Last year the bookies got it right, predicting the Netherlands as the victor. However, they also predicted Australia as the second favourite, closely followed by Sweden.

It was in fact Italy that took second place and Russia that came third. In the run up to the 2019 contest, Italy was predicted as the Eurovision winner by Spotify, which based its predictions on each song’s popularity on its streaming service. Even the OGAE, the biggest Eurovision fan club network’s poll, predicted Italy as the 2019 winner.

What are the odds?

One thing’s for sure, predicting the Eurovision winner is far from easy. But is it an exact science? Strict guidelines on songs and performances give all participants an equal start. The maximum song duration is three minutes; the artist must be over the age of 16 on the day of the final; they must sing live with only an instrumental backing track; and there must be no more than six performers on stage – and strictly no animals!

What other features will be key to the success of the 2020 winner? Our teams at Altius are determined to put their data science skills to the test to predict the results of the final.

Getting started

It’s still early days in the 2020 contest, but because Eurovision’s been going since 1956, there’s a lot of historical data to analyse (more than 1500 songs!) We started with datasets from Wikipedia and from ‘datagraver’ on data.world. Due to the complexity and amount of data available we decided to approach the problem as a series of analysis:

  • In the first analysis described below, we visualised the voting data and built a baseline model for predicting a winner for the 2019 contest, based only on historical votes given and received.
  • In the second and further analysis (still to come, stay tuned!) we will improve the baseline model by including new data sources, such as song lyrics and song features, which will hopefully help us predict the 2020 winner!

From ABBA to Conchita Wurst

We started visualising historical data by building two interactive Power BI reports. The first report features a map that can be filtered to display the number of times a country won the competition (Win Count); the average place they came over time (Average Rank), and their percentage of points received relative to the maximum number of points achievable (Points Ratio). Click on each heading at the top and then click “play” to cycle through the years!

The historical data aggregates over time in the report, so as countries accumulate scores their colours change. After you’ve watched the cycle play through, try selecting a particular year to see the state of play. Take 1987 for example – the UK and Ireland had the most wins so far. How times have changed for the UK! And for the rest of Europe? Until the end of the 1980s, Western Europe dominated the contest, but as the number of contestants grew the winner focus switched to Eastern Europe.

The second report features a bar chart, which tracks every winner’s performance over time. Click play to see the bar chart growing each year as new winners emerge, or select a country from the map to view its place overall.

Looking at country interactions, we’ve uncovered some interesting observations. There are also clusters of countries that always exchange points (the Nordics and the Baltics, show in green on the top right network below). There also appears to be a slight advantage to singing in English and to hosting the contest.

Countries with strong points links (high point ratio) are clustered together

 

 

Another observation – Russia receives many points from Latvia, but doesn’t reciprocate the favour.

Average point ratio given to Russia by Latvia (left) and by Russia to Latvia (right)

 

Not all points are made equal

Since 1956 the point system and countries participating (among other criteria) have changed over time. In order to compare fairly across the years, we created a metric called “points ratio” (mentioned above). This corresponds to the ratio of points a country receives in the finals relative to the maximum number of points achievable. For example, with 41 contestants in 2019, the maximum points achievable were 960 (that’s 40×12 judges points and 40×12 public points). A country that received 100 points in total in 2019 would have a 0.1041 points ratio (100/960). We initially modelled judges and public votes separately, as they differ quite substantially, but public voting was only introduced in 2015, so we weren’t able to find meaningful patterns.

Let’s get technical

To build our model we converted the historical dataset into a feature set. Created features include: which countries participate, where they get their points from, are they the host country, is the song in English? We then applied various models: Random Forest, LightGBM, Extreme Gradient Boosting (XGB) and a Neural Network.

The XGB model has so far proved best capable of capturing the dynamics of the official judges points. For example, Australia appeared most likely to receive the majority of the judge’s points, followed by Sweden and Austria. However at this time our model’s predictions are proving far from accurate, as we ‘predicted’ only four out of the top 10 in 2019.

This is just the beginning…

With only historical points data we will not be able to predict the winner for 2020. However we will be including new data sources and features, such song lyrics, tempo, volume and energy, to give us enough insights to predict which country will win Eurovision in 2020.

Stay tuned for part II of our Eurovisionary series!