Tennis, statistics, tornadoes

Matteo Quartagno
Apr 5, 2020
14 min read

Quasi-Simpson, Break Points, Conversion rates and Roger Federer

Tennis has always been one of my favorite sports. I played tennis competitively for few years when I was a teen, but I was just awful. I remember playing with kids that were about 7 years younger (and 50cm shorter) than I was and only winning sporadic sets. A couple of years ago I found out that one of these kids I used to play with was then ranked 152 in the world. Since then, the story I tell is that I once beat a top200 tennis player. I generally omit the fact that he was still years away from his puberty when that happened.

Some of you may have noticed that the title of this post is a tribute to the great David Foster Wallace. In Tennis, trigonometry, tornadoes he talked about how becoming a good tennis player is not (only) about having great phisical abilities, but (also) about quick thinking and a good mathematical mind:

My flirtation with tennis excellence had way more to do with the township where I learned and trained and with a weird proclivity for intuitive math than it did with athletic talent. I was, even by the standards of junior competition in which everyone's a bud of pure potential, a pretty untalented tennis player. My hand-eye was OK, but I was neither large nor quick, had a near-concave chest and wrists so thin I could bracelet them with a thumb and pinkie, and could hit a tennis ball no harder or truer than most girls in my age bracket. [...] Unless you're one of those rare mutant virtuosos of raw force, you'll find that competitive tennis, like money pool, requires geometric thinking, the ability to calculate not merely your own angles but the angles of response to your angles. Because the expansion of response-possibilities is quadratic, you are required to think n shots ahead, where n is a hyperbolic function limited by the sinh of opponent's talent and the cosh of the number of shots in the rally so far (roughly). I was good at this. What made me for a while near-great was that I could also admit the differential complication of wind into my calculations; I could think and play octacally. From Tennis, Trigonometry, Tornadoes David Foster Wallace

Either I do not have as much as a mathematical mind as I'd like to think, or my athletism is so low that it never allowed to even get close to become a good tennis player. Yet, one thing I always liked doing is analysing pro tennis results (I know, possibly the saddest way to conclude this paragraph).

From Middle to Modern Ages

Tennis is one of the most interesting sports to analyse, both mathematically and statistically speaking. This is, first, because of the way the score is organised. In football (soccer) who scores the most goals wins. In basketball, who scores the most points wins. In tennis this is not always the case, because the score is organised in three levels: points, games, sets.

Most matches are played on Best of 3 (sets) format, while the most important Grand Slam Tournaments (Australian Open, Roland Garros, Wimbledon and US Open) are the only ones still played on Best of 5. To win a set, one needs to win 6 games with at least a 2 games advantage. This is because players alternate in serving in subsequent games, and so, in order to win a set, a player has to win at least one more game than the opponent when they are at a theoretical disadvantage, that is when they are not serving. When this happens, we say that a player has breaked the opponent.

Finally, to win a game, a player has to score at least 4 points, and two more than the opponent. This is where the famous weird tennis scores (15-30-40) come into play.

Tennis is one of the oldest sports in the world, with King Henry VIII and Mary Queen of the Scots listed among the early fans of the game. To keep the score, players used to use a clock. For every point won, they were moving the clock 15 minutes ahead. Hence 15-30-45-game won. However, remember a player had to win the game with at least a 2 points advantage. If two players ended up 45-45, how would they move the clock after the next point? The solution was to actually only move the clock to 40 when a player scored the third point in a game, and to 45 only when they scored a point with the score being 40-40. Nowadays, no clock is used and the terms advantage and deuce are used in place of 45 and 40-40, but apart from that the strange scoring system has remained.

With the original scoring system, a game could last for hours. Of course, in modern times, when incomes are the main worry and the attention span of the average spectator is quite narrow, some of the rules have been revisited. The main one that is now well established is the introduction of the tie-breaker, a single game where players serve alternately that decides a set when no player has managed to win it by the time the score is 6-6.

The (quasi-)Simpson's paradox

Technically a player can lose a game winning more points than the opponent, but what is the theoretical maximum proportion of points that a player can win yet losing the match? Let's consider a Best-of-5 game that our friend Roger manages to lose, despite being overwhelmigly the best player on court. Let's start from the top level of the score. Since Roger has lost the match, the most honourable defeat possible is 3-2. Now let's take every set. In the two set won, the best possible result is a double 6-0, winning all points, 24 in a row, what is sometimes called a golden set.

This is a very rare occurence, as it happened only once in the history of Grand Slam matches, with the Kazakh Yaroslava Shvedova smashing the poor Sara Errani at the All England Club in 2012. Curiously, Wikipedia has a list containing the names of 14 players only winning a golden set in recent history of the game, between men and women, and this includes the player I once beat 15 years ago (OK, I'll stop bragging about beating a 6-yo boy, who otherwise beat me thousands of times despite possibly still wearing a nappy).

In the three lost sets, the most honourable score would be a triple 7-6, losing the tie-breakers 7-5 and the standard games "at 30" (that is scoring 2 points) and holding to love six games (that is winning all points). A quick calculation tells us that in this scenario Roger would win:

4*6*2=48 (points in the two sets won) +

4*6*3=72 (points in the six games won in the two sets lost) +

2*6*3=36 (points in the six games lost in the two sets lost) +

5*3=15 (points in the three tie-breakers lost)

171 points

The total number of points played in the match would be:

24*2=48 (points in 2 golden sets won by Roger)

6*6*3=108 (points in game lost by Roger in 3 lost sets)

4*6*3=72 (points in games won by Roger in 3 lost sets)

12*3=36 (points in tie-breakers)

264 points

Hence, the maximum proportion of points won in a lost match could be as high as 171/264=64.8%. Repeating the same calculations for a Best-of-3 match, the maximum proportion of points is just marginally lower, at 63.1%. Of course, this is just the maximum theoretical proportion, while in reality, when it does happen that a player loses winning more points than the opponent, it is generally by a much tighter margin. In the last 30 years, approximately 4/5% of professional tennis matches have been won by the player scoring less points. This phenomenon is known as the Quasi-Simpson paradox.

Managing to talk about TB in a post about tennis

Edward Simpson was a British statistician who started his career during the second world war: he worked as a code-breaker alongside Alan Turing in the famous Bletchley Park mansion, although the two did not really work together. After the war, he became famous for a paper describing a mesmerising statistical paradox. His paper was fairly technical, and was the first to describe the problem precisely, but similar examples had been given years before, by other (more) famous statisticians like Udny Yule and Karl Pearson, or philosophers like Morris Cohen and Ernst Nagel. To illustrate the problem, let me use the example used by these last two in 1934, 17 years before Simpson's seminal paper.

Nagel and Cohen looked at the death rates from TB in two american cities: Richmond and New York City. They noticed that the overall death rate was higher in Richmond than NYC. However, they then looked at the data divided by ethnicity (White vs Other) and noticed that within both "ethnic groups", mortality was actually lower in Richmond than NYC. The implication was clear: both white and non-white Americans were better off in Richmond, but Americans (without considering ethnicity) were better off in New York. How was this possible? The reason stands in the different prevalences of ethnicities in the two cities. Non-white had higher TB mortality and made a much larger proportion of the population in Richmond. The implication is again clear: if you had to decide where to go to live, whatever your ethnicity, in Richmond there was a lower probability of dying by TB. But if you were living in Richmond, then without knowing anything about you, we'd assume you had a larger probability of dying by TB than a random person living in NYC.

“Simpson’s Paradox”, from RJ Andrews' twitter account

Coming back to our tennis example, the link is clear: Richmond is Roger, NYC is Novak, his nemesis. Ethnic groups are different sets. There is a slight difference between the two examples, though: within sets we are interested in the player who scores more points, but overall victory goes to the player who wins more sets. Hence there is a slightly different outcome variable at the two levels, points within sets and sets within the whole match. This differs from the TB example, where it was death rate we were comparing, both within an ethnic group and across groups. For this reason this is called a Quasi-Simpson's paradox.

Is the GOAT fragile?

It is no coincidence that we called our imaginary player Roger. Unless you lived on the moon for the past 20 years, you probably know that the swiss Roger Federer has re-written the history of the game, being ranked number 1 in the world for more than anybody else (310 weeks, until Djokovic will break this record) and winning more Grand Slam titles than anybody else (20, one more than Rafa Nadal thus far). But, as incredible as his numbers are, it is not (or at least not just) because of these numbers that Roger became the icon he is. It is because his tennis is almost art. You never know what is going to come out of his hand next. The one-handed backhand can both fire a topspin winner, or a slice to call the opponent to the net. He can play effective old school serve-and-volley, in an era where everybody plays almost exclusively from the baseline. He even invented return-and-volley, something he called SABR (Sneaky Attack By Roger). Novak Djokovic is possibly the most perfectly built player, considering the sum of all components. Rafa Nadal is the most incredible fighter and defender in the world. But neither of them is Roger. Roger is pure genius.

So why did we use his name to talk about the tennis Quasi-Simpson paradox? There is only one thing in which critics have always accused Roger of failing to excel: stepping up in the most important moments of a match. Remember, not all points are equally weighted in tennis: break points have an enormous weight within a match, compared to a point when you are 0-40 down in the first set. Roger seems not to think about this, as he always plays at his top level, no matter what the score is. Because of this, a very peculiar statistics has been shown: thoughout is career, and up to 2018, Roger has played 40 matches that have ended up in a Quasi-Simpson paradox, so where the winner scored less points than the loser. Out of these, he only won 7 matches. This means that for every match won scoring less points than the opponent, he lost about five scoring more points.

Given this statistic it is tempting to just conclude that critics of Federer's mental strength are right, and that he indeed lacks a bit of ability to step up the level of his game when most needed. Indeed, these data may at first even suggest he somehow feels the pressure at the most important moments, letting his level drop.

Lisi and friends, though, suggested that this might not be the case. We have seen in a previous post that a common strategy to deal with results affected by random variation is to calculate the probability of observing these results (or possibly even more extreme) under a "null hypothesis" that has some theoretical or practical importance. This approach is usually taken when we have some sort of control over the design of the study. In this case, we only have 40 observations, and so the sample size is definitely too small to conclude anything sensible. What we can do, though, is using another of the statisticians' secret weapons: simulations.

What would happen if...

I may talk about simulations more widely in a future post, but for the moment it's enough to think of simulations as a lazy statistician weapon; this is because most of the times, it is possible (and better) to obtain results from simple mathematical calculations. When systems are particularly difficult to analyse mathematically, though, running simulations is incredibly easy these days, and hence similar results can be obtained at a much lower cost. When we do a simple frequentist analysis, what we do is calculating the probability of obtaining certain results if a hypothesis is true. So the simple simulations approach, is to generate some data artificially assuming that the hypothesis is indeed true. Lisi and friends did exactly that: they simulated the results of 50 000 tennis matches, for different values of difference of strength between the opponents. What they found was a result that, at first, may appear an even bigger paradox than the Simpson's one: the stronger a player, the lower the proportion of Quasi-Simpson paradox matches won. The explanation, though, is quite simple: if a player is much stronger than the opponent, they usually win both the match and more points. Winning more points, though, is much simpler if you are stronger than winning a match. That is again because throughout the match not all points are equally weighted. So if a match ends up in a Quasi-Simpson paradox, it is more likely to be because the stronger player has lost. Hence, even under the assumption that there is no strategy involved, the peculiarity of tennis score system might suggest that Roger's stats might even be a testament to his superiority.

The chance to win a game as a game of chance

There is another statistic that is often used to criticise Roger's mental strength, though: the break point conversion rate. As I explained at the beginning of this post, break points are points that, if won, give a player a break, that is a game won while returning. Since serve gives an important advantage, particularly when playing on faster surfaces like grass, winning a game on the opponent's serve is often enough to win a set. For this reason, break points are the most important points in a match.

For years, it has been suggested that Federer's conversion rate is not up to his standards. Now let's go and check this, comparing his data with those of the two other greatest players of the 21st century, Rafa Nadal and Novak Djokovic. Data are available in the website www.ultimatetennisstatistics.com, the Bible of data nerds.

Roger won 41.3% break points throughout his career, over a total of 11822 chances. Rafa stands instead at 44.9% and Novak at 44.5%. At first sight the difference may seem quite substantial, but let's try and repeat Arbuthnot's reasoning: let's calculate the probability to observe such a large difference (or an even larger one!) between the conversion rates if indeed the conversion rate was absolutely identical. If we do that, we find that the probability is infinitesimal. So it is highly likely the conversion rates are different. This should come at no surprise at all. The three are different players, and they are likely to have different proportion of points scored anyway. If given enough data, we could find that even a difference as small as 1% could be considered significant. Here we have a quite large set of data, with around 10 000 observations per player, so even small differences can come out as "significant". For this reason, I believe it is better to focus on estimating intervals likely to contain the true difference. This is in my opinion the better approach whenever the study has not been designed to answer a certain question. That is, whenever the sample size is just made up of all observations we could collect.

Doing so, we find that we can be 95% confident that the true difference between Roger and Rafa's conversion rates is somewhere between 2.4% and 5.0%, and between Roger and Novak between 1.9% and 4.7%. The differences seem indeed quite important for players of similar strength. However, there are possible caveats:

1) Novak Djokovic is known as the best returner in the world, possibly the best in the history of the game. Hence, it is expected that he has a higher proportion of break points converted, simply because he will have a higher proportion of return points. Indeed, if we do a similar calculation but considering all return points, we find that Novak had a probability of winning any return point between 2% and 3% higher than Roger. The magnitude of the difference is similar to that for break points, and the width of the interval is smaller reflecting the larger amount of information (about ten times as many points).

2) Rafa Nadal is known as the best clay player in the world, possibly the best in the history of the game. Clay is a much slower surface, where the advantage of serve is much lower compared to grass or hard surfaces. Hence, the proportion of return points won on clay is likely to be higher. Indeed, if we only focused on break points on grass, Roger has a slightly larger conversion rate than Rafa, with a difference that is only marginally lower than that considering all return points. From these two stats, the difference between the three is explainable by their technical characteristics, rather than by their mental strength. As a counterexample, Roger is by far the best of the three when it comes to service, with more than 10% aces compared with 4% (Nadal) and 7% (Djokovic), and proportion of points won with the first service similarly higher. Nobody of course would ever hypothesise that the reason for this is that Nadal or Djokovic are mentally weaker on serve. The last thing we can compare is the number of break points won by Roger with the number of non-break return points won. The null hypothesis here is that the probability of winning a point is the same whether or not the point is a break point. Doing this we find that, surprisingly at first, the probability of winning a point is actually higher when the point is a break point. The difference is likely to lie in the interval [0.8%, 2.7%]. However, this is scarcely interesting, for a simple reason. Break points are likely to be played on average against weaker opponents than non-break return points. That is because against a big server like the 2m tall Ivo Karlovic, it is very difficult to win any point on their serve. On average, there will be way more break points against average servers, and hence we expect to see a larger probability of winning a point if it's a break point. This is indeed true for pretty much every player for which enough data are available, and particularly for fast surfaces. On clay things change slightly because service is not as important, but the general reasoning remains the same. Here comes the only really surprising statistics: the proportion of non-break return points won by Roger on clay is slightly larger than that for break points. The difference is extremely small, and when calculating a likely interval, this is broadly centered around zero, but this still remains possibly the only surprising result.

Overall, all these analysis show that Roger Federer's troublesome relationship with Quasi-Simpson's paradox matches may be actually due to his superiority and that his conversion rate is not that bad, possibly with the exception of clay matches. To be honest, nobody should even be allowed to think that someone able to win 20 Grand Slam tournaments could be mentally weak, but since I've heard this thesis more than once I thought it was interesting to test it with data.

And to be even more honest, I am sure it will not be for these numbers that Roger Federer will be remembered, but for stuff like this:

Tennis, statistics, tornadoes

Recent Posts

Comments

Join my mailing list