The degree to which competitions can be objectively judged depends on the type of competition. For example, athletes competing in a 100m sprint can be objectively ranked by the time they need to cross the finish line. On the other hand, objective judgement is virtually impossible in music competitions, because so much depends on highly subjective personal opinions, which can differ widely across members of a jury. A famous example is given by the 1980 Chopin Competition in Warsaw, where Ivo Pogorelich was eliminated in the 3rd round. This prompted Martha Argerich, declaring Pogorelich a ‘genius’, to resign from the jury in protest. It is also important to note that, since personal opinions play such an important role in judging music competitions, there is an increased risk of biases resulting from non-music related (i.e. confounding) factors. For example, it is often said that the order of appearance affects the final ranking of the candidates. In addition to jury members as a potential cause of these biases, the contestants themselves may also be involved generating these biases, for example when a specific positioning within the program of a competition has a certain psychological effect on a contestant.
The Queen Elisabeth Competition is one of the most prestigious music competitions in the world, and while watching the final round of this year’s edition in May, I was wondering to what extent the order of appearance has influenced the final rankings over the years. To assess this influence, I collected the data on order of appearance and ranking in all Queen Elisabeth Competition finals since 1952 from the website of the competition, in total 216 results for 18 piano competitions. In the final round of the Queen Elisabeth Competition, 12 pianists give their performances spread across 6 days, where each pianist is randomly (!) allocated a specific time slot on one of the 6 days, either before or after the intermission. Hence, one can imagine the final ranking being influenced (‘biased’) by the day of performance (1 – 6), as well as by the order of performance on a given day (1 or 2). This is visualized in the boxplot below, where prizes won by finalists over the years are divided into categories based on the day of performance, and the order of performance on a given day.
Note that for all competitions since 1995, only prizes 1 – 6 have been awarded, and the remaining finalists are unranked. In the plot above, these unranked finalists have been awarded the ‘9.5th prize’ (the average of 7 and 12).
Quite strikingly, the boxplot shows that final ranking is indeed substantially influenced by day and order of appearance. For example, if you’re the first finalist on the first day, this may mean you have lost before you have played even a single note: the 2nd finalist on the last day was typically ranked 5 to 6 places higher than the 1st finalist on the first day! (as measured by the difference in median prize, i.e. the thick black line within each of the gray boxes). At least equally striking is how prize depends on the order of performance on a given day. For example, as a finalist you may be happy about being selected to play on the last day. However, it may not do you any good, unless you are scheduled to play after the intermission: the 2nd finalist on the last day was typically ranked 5 places higher than the 1st finalist on that same day! More generally, the 1st finalist on a given day typically did quite a bit worse than the 2nd finalist on the same day. Moreover, a 1st finalist on days 4, 5 or 6 typically did worse than a 2nd finalist on days 1, 2 or 3.
The above observations imply that if we would want to place bets regarding the final ranking, without having any prior knowledge of music, we may actually be able to do quite a bit better than random guessing. For example, suppose we would want to predict whether a finalist receives a prize or is unranked (or, for competitions before 1995: whether a finalist ranks between 1 to 6, or between 7 to 12). In this case, by random guessing we would expect to classify 6 out of 12 finalists correctly. Guessing more than 6 correctly as receiving a prize or not, is better than expected. (Some technicalities are following, but feel free to skip to the last paragraph of this post.) One way of trying to do better than the expected number of 6 correct guesses, is to use a Random Forest classifier. Doing this in the statistical programming language R, using the randomForest package, gives a specificity of 0.56 and a sensitivity of 0.71, as determined using the predicted classes of the input samples based on out-of-bag samples (p ~= 0.0001 using Fisher’s exact test). Together, these give a generalization error rate of 37%.
Hence, using a classifier it is expected that one would classify 100% – 37% = 63% of finalists correctly as receiving a prize or not, purely based on day and order of performance of the finalists. Note that 63% of finalists amounts to 7 – 8 finalists, which is 1 – 2 more than expected by random guessing. This again demonstrates the substantial bias in the final ranking of finalists in the Queen Elisabeth Competition, induced by day of performance, and order of performance on a given day.