Aging Curves For IndyCar Drivers

Aging curves are very popular on baseball analytics sites like Fangraphs and Baseball Prospectus. They give a forecast as to how a typical player ages over time and how their batting average, WAR, or some other statistic might change with the seasons to come. They give a best guess as to how a player will age, so while it is by no means a perfect representation of player aging, it is helpful. The real benefit of aging curves are that they can help us forecast a player’s (or driver’s in our case) performance in future seasons based on how he’s performed so far. No one has really tried to do the same for IndyCar, though David at Motorsports Analytics has done something similar for NASCAR, so I thought I’d give it a try. 
The basic technique behind constructing an aging curve is this: look at back to back seasons for many different drivers and see how a statistic (for example average finishing position or AFP) changes between those years. Put these “changes” into a bucket representing the ages, find the average change from say age 32 to 33, and then do this for all of the age pairings in your data set. For example, in 2013 Scott Dixon was 32 years old and had an average finish of 8.2. The next season he was 33 and had an average finish of 8.3, so his change between ages 32 and 33 is +0.1 average finishing position. This would go into the general 32/33 bucket for every driver who raced in back to back seasons at those ages. Once all of the 32/33 changes are put into the bucket, the entire bucket is averaged together to get the average change for a typical driver between those ages. 
This is called the delta method for constructing an aging curve because it looks at the average change from one year to the next in back to back seasons. I created three different aging curves for IndyCar drivers. One for average starting position, one for average finishing position, and one for championship position. I looked at 43 different drivers. To be include in the data set, the driver had to race in at least two back to back seasons from 2002-2017 and race in at least five races each season. This provided a total of 273 seasons from which to develop the aging curves.
Here is a look at the aging curve for average finishing position. 
If you have ever seen an aging curve for baseball before, you might be a little confused by this. In baseball, players want to have a high batting average or WAR, but in racing and with the statistics we’re looking at, you actually want a lower number. So the aging curve looks flipped when compared to this one for baseball. The y-axis is the change in average finishing position from peak year. Moving down the y-axis indicates a decrease in averaging finishing position (ex: moving from an average finishing position of 17th to one of 11th). For example, when a driver is 23 years old, he’ll be projected to shave a little over four places off of his current average finishing position by the time he reaches age 28 — this is the peak age for average finishing position performance. So if a driver has an average finishing position of 14th when he’s 23, we’d expect him to be have an average finishing position of 10th in five years time, based on how a typical driver ages. This graph also shows the year to year change for different age couplets. Going from age 34 to 35, a typical driver’s average finishing position increases by 0.4 places, meaning he is less successful than the year before.

The key to remember when reading aging curves is that an increase in average finishing/starting/championship position is a bad thing (AFP of 14.5 going to 16.7) and a decrease is a good thing (AFP of 20.1 going to 14.2). It doesn’t sound quite right at first, but that’s just the nature of starting/finishing/championship position in IndyCar.

Overall, the average finish curve shows us that drivers drop a little over half a place off of their average finishing position per year until age 28. After that, it is a gradual increase as the driver comes out of their prime years and starts moving down the grid again. Drivers lose their abilities much slower than they gain them, with a fair number of drivers even having better seasons than expected as they get older because of this.

The aging curve for average starting position is read exactly the same way as average finishing position, with lower numbers meaning a driver is closer to the peak age performance.

What’s interesting about this aging curve is the dip that occurs when going from age 20 to 21, even though the peak age still turns out to be 28. This clearly isn’t what we’d expect to happen from a “typical” driver, and it’s caused by the relatively small sample size of drivers in that age group competing in back to back seasons. When using this aging curve for projections, I adjust for this anomaly by providing less weight to that couplet.

Drivers seem to lose their qualifying abilities quicker than their racing abilities: by age 38, a driver has lost about 4 places off of their peak average starting position compared to just 2 places off of their average finishing position at the same age. This speaks to the idea that being consistent and safe throughout a race can pay off in the long run. In qualifying, drivers only need to get through one quick lap to place high. In a race, you need consistent laps and to stay out of trouble for over an hour and a half of racing in order to finish in a good position. Drivers who have a lot of experience are able to maintain high average finishes for a long time because they know this better than the younger drivers who are new to the series. Younger drivers seem more willing to take risks which can pay off in high average starting positions (think of the dip at age 20/21), but these don’t always translate to high average finishing positions as we saw above. It’s much easier to get away with “on the edge” driving for one lap compared to 80.

And finally, we have the aging curve for championship finishing position.

The championship aging curve shows that the peak year for championship position is right around the age of 27. At first glance this might surprise you given that the average age of the champion has been close to year 30 years old for the years 2000-2017, but remember that this is an aging curve for all drivers, not just drivers that went on to win the championship. On average, drivers are achieving one of their best championship finishes around the age of 27. Early on in their careers, drivers experience a lot of variation in season to season performance in the championship. This is likely because a few good results — which might be the result of luck early on — can boost drivers up the standings a lot when they are near the back in points. After drivers hit 23 their performance becomes more predictable throughout the rest of their career. Drivers drop back roughly 0.75 places in the points standings each year after the age of 28.

As I mentioned before, aging curves are not a perfect forecast: they are just our best guess based on how typical drivers have aged. One of the main drawbacks of aging curves, and a problem with every sport, is called survivorship bias. This is the idea that only the best drivers will be included in the later age ranges because all of the worse performing drivers will have been let go by then. If a driver isn’t good, he won’t stay in the series for long and he won’t have many back to back seasons to use. This is especially a problem in the first few seasons as new drivers come and go having only raced in one or two seasons. After that things start to even out and the survivorship bias doesn’t matter as much because most of the drivers in the series are typical of the rest of the drivers who have made it that far too. This is something to keep in mind when looking at aging curves.

There is a lot that can be done with aging curves and a lot that can be learned from them — too much to include all in one article. Here are the main takeaways from the aging curves and what we’ve looked at in this post:

  1. The typical driver peaks in average finishing and average starting position around the age of 28. 
  2. Drivers lose their ability to have a high average starting position quicker than their ability to have a high average finishing position. This is possibly due to younger drivers’ willingness to take risks in qualifying and the better experience older drivers have in managing race situations.
  3. There is a lot of early volatility in how drivers perform in the championship early on in their career. This evens out as their career goes on.
  4. Survivorship bias is an important thing to remember when evaluating aging curves.

As the season gets closer and the rest of the vacant seats get filled, I will post my projections for the 2018 IndyCar season based partly on these aging curves for each driver’s average starting/finishing position and their championship position. Look out for those!

Follow The Single Seater on Twitter!

by Drew

Don’t Try to Predict Where a Driver Will Finish Based on Where He’s Starting

The general consensus has always been that the higher up you start in the grid, the higher you’ll finish in the race. This makes sense. The fastest cars qualify at the front of the grid and we expect them to perform well in the race too. But what exactly is the relationship between starting position and finishing position? Can we tell a lot about where a driver is likely to finish based on where he starts? These are a couple of the questions I want to look at today.

Using data from 2012-2017, I looked at how starting position correlates with finishing position. Here’s a plot of finishing position vs. starting position for those years, with a trend line added.

A driver’s starting position explains only 12.9% of their variation in finishing position, meaning that a driver’s starting position isn’t very predictive of their finishing position. If you just know where a driver started, you can’t predict their finishing position with much accuracy. I was expecting this number to be a little higher, but the more I thought about it, the more it started to make sense. 
There are a lot more factors that go into where a driver ends up finishing a race than simply where he starts. Accidents, untimely cautions, and differences in strategy are just some of the things that can make high qualifying drivers perform poorly and let poor qualifying drivers sneak into the top half of the field. Qualifying position is only one part of the puzzle that determines the results of a race. And on top of that, the difference between say 12th and 13th place is usually down to more luck of the draw then driver skill, making the prediction of individual places difficult. 
But this leads us to another question: what are the factors that truly impact where a driver finishes a race. Are practice results predictive of race performance? Or how a driver has raced at that track in the past? Or is racing inherently subject to a lot of randomness and it can’t be predicted with much accuracy? These are all interesting questions that I would like to tackle in the future, but trying to determine how they all interact with each other would be too much for one article. I would like to look at these factors one by one in different articles in the future (starting with qualifying position in this one). Race prediction is obviously the ultimate goal, but before we can determine if it’s even a feasible goal, we need to see what the different factors are that go into determining where drivers end up in a race. 
What we do know now is one part of the bigger picture: qualifying position is a statistically significant predictor of finishing position, but it isn’t a very good one. It explains just around 13% of the variation in finishing position and has a standard error of 6.7 places. Qualifying is a good place to start our investigation into finishing position, and I plan on looking at the other aspects that go into race performance in future posts.
by Drew

Graph of the Day: Age of the IndyCar Champion Over Time

Graph of the Day is a short piece where I post an interesting graph for chart I came across while doing research. Today’s graph looks out how the age of the IndyCar champion has changed over the years. 
Before Newgarden’s title at the age of 26, it appeared the age of the champion was rising ever so slightly on average. Newgarden was the fifth youngest champion since the year 2000. 
It’ll be interesting to see how the younger drivers in the series (Newgarden, Rahal, Rossi) change this trend over time. As the veteran drivers start to retire, I’d expect the average age of the series champ to tick down a little.
by Drew

A Better Measure of Season Competitiveness in IndyCar

In the last article I wrote, I talked about a way to measure the competitiveness of a given IndyCar season. If you haven’t read that article, I would recommend doing so before continuing with this one. That measure was a fairly good first attempt at measuring competitiveness: it gave a good idea of the spread of the field and how dominant the champion was. Kyle Brown, a fellow IndyCar blogger who focuses on the statistics and data of the sport, left a comment on that post suggesting a different approach to measuring competitiveness that built off of what I started with.

Kyle’s suggestion was to sum all of the competitiveness ratios (now referred to as CR) for a given set of the field (we looked at sums of the top-10, top-5, and top-3 drivers specifically). Then, we averaged the ratios for all drivers for each of the sets we looked at — for example, for the top-3 set, we summed second place’s CR and third place’s CR and averaged them. This leaves us with Average Competitiveness Ratio or ACR. As a reminder, an individual place’s CR is given by:

CR = (Champion’s Points – X Place’s Points)/Total Possible Points for a Driver
The advantage of this system over my original is that it takes into account how close each driver was to the champion as opposed to just the X place driver. For example, consider the following two seasons. In hypothetical season A, the top-9 drivers were all separated by one point each and the tenth driver was 200 points back from the champion. In hypothetical season B, the top-9 drivers were all separated by 20 points and and tenth driver was 200 points back from the champion. Under my original method, these seasons would both have the same competitiveness ratio for the top-10 of the field. Under the new system, season A would be considered more competitive (as it should be, because there are more drivers in the championship battle and close to each other) because it’s ACR would be less than season B’s. My original method is a good measure of thecompetitiveness in terms of the spread of the field, but ACR is a better measure of how competitive all of the drivers were in terms of the championship battle. 
Once again, this process gives us a ratio for each season from 0 to 1. The former would be a perfectly competitive season (all drivers score the same number of points) and the latter would be a perfectly non-competitive season (there is one driver winning every possible point in every race and the other drivers aren’t even competing). 
Now that we’ve got the description out of the way, we can get into the results. First, let’s look at the ACR for the top-10 drivers in the seasons 2000-17. 
The 2015 and 2013 seasons both had an ACR of 0.097, making them the most competitive seasons in terms of top-10 competitiveness in our data set. For comparison, 2015 — when Juan Pablo Montoya and Scott Dixon ended up tied in the points after Sonoma — was the most competitive season by my original method. 2001 was the least competitive year as Sam Hornish Jr. won the championship 100 points clear of the field. We see the same drop off at the 2012 season that we saw the last time around, and once again, I think this could be because of the adoption of the Dallara DW-12 chassis. It seems to have made the field more competitive on the whole (or at least the top half of the field).
We get the following graph for the top-5 spots in the championship.
The results show us that 2006 had the most competitive top-5 championship. This season the top four places in the championship were separated by just 15 points. Hornish Jr. and Dan Wheldon were tied after the last race and the former won on a tiebreaker. Six different drivers took home race wins and nine different drivers finished on the podium. 
And finally, we have the graph for the top-3. 
2006 was the most competitive season for the top-3 places in the championship, for the same reasons mentioned above. 2001 was the least competitive season for the top-3, largely in part because the champion was 100 points clear of the runner up. And in 2016, Simon Pagenaud won by 127 points, making it the 2nd least competitive season in our time frame with an ACR of 0.14.
Using ACR as opposed to CR for a given place provides a more accurate measure for the competitiveness of a season. It takes into account all places within the specified range and shows how competitive the championship battle really was, which is, at the end of the day, what we all really care about. Exciting championship battles are a sign of an exciting season. I would say using ACR for the top-5 or top-10 drivers is my preferred range when looking at how competitive a season is, as that’s where most of the action on track takes place each race. I’d prefer a season with a lower top-5 or top-10 ACR over one with just a super low top-3 ACR, because that means many drivers are in the hunt, even if it means the gap to second or third isn’t incredibly low (<10 points). People could have different opinions on which method they prefer to look at of course.
Thanks again to Kyle for the great suggestion and help in compiling the data for this project. 
by Drew

Competitiveness in IndyCar

This year’s IndyCar championship came right down to the wire. There were multiple drivers in the hunt heading into Sonoma and Josef Newgarden ended up winning the title by just 13 points. Helio Castroneves, who finished fourth in the championship, was less than 50 points behind Newgarden when the checkered flag flew. This was a highly contested championship, and it got me thinking about competitiveness within an IndyCar season. There are definitely seasons where there is a dominant driver and ones where multiple drivers are battling it out all season and everyone is close. 
To answer this question of competitiveness, I first had to decide how to measure it. I wanted to look at how competitive the top half of the field was in a given year. I chose the top half to eliminate the problem that some seasons had many lower-level drivers with a lot of DNFs, and account for the fact that different seasons ran with different numbers of drivers. This made the process easier and more valid in my opinion. In the future I might look at competitiveness across the entire field if I come up with a good way to deal with the differing number of drivers and the DNF problem. Looking at the top ten competitiveness gives us a good idea of how hotly contested the championship is and how the races are likely to turn out — plus, most of the important action throughout races happens in the top ten. Less competitive seasons will see repeat winners and more straightforward races for the most part. Since realistically the entire field does not have a shot at winning the title, looking at the top ten is sufficient for our discussion. 
I used a measure similar to the one The Stats Zone used to measure competitiveness across soccer leagues in Europe to measure competitiveness in IndyCar. 
I took the total number of points the champion of that season scored and subtracted from it the number of points the 10th place driver in the championship scored. I took this number and divided it by the total possible points a driver could have scored in that season to account for the different points systems and number of races across seasons. This gives us a ratio from 0 to 1, where 0 would mean a perfectly competitive season (all drivers score the same number of points) and 1 would mean a perfectly non-competitive season (there is one driver winning every point in every race and other drivers aren’t competing). Obviously both of these scenarios are basically impossible, but they represent the extremes and what we mean by perfectly competitive and perfectly non-competitive. 
Here is a chart of all of the competitiveness ratios from 2000-17:

By competitiveness ratio, 2015 was the most competitive IndyCar season. In this season, Scott Dixon and Juan Pablo Montoya ended up tied for the points title after Sonoma, and the former won it on the tiebreaker. Three drivers were within 100 points of Dixon and it was a hard fought championship until the end. The ’13 and ’12 seasons fall in spots two and three in competitiveness. These years saw four and five drivers respectively finish within 100 points of the champion, and ’12 was won by just three points by Ryan Hunter-Reay. The least competitive IndyCar season was 2001 when Sam Hornish Jr. won the championship by over 100 points, finishing on the podium ten times in the 13 race season. 2002 was also an interesting year: while Helio Castoneves was only 20 points behind Hornish Jr., who won the championship again, the rest of the top ten dropped off quickly after that, giving the season a competitiveness ratio of 0.32.
Looking at competitiveness ratio by season, we can see an interesting trend. Right around 2012 there was a drop off and the series has been more competitive since. This could be because of the introduction of the Dallara IR-12 chassis at the start of the 2012 season. The new chassis might be have leveled out the competition more and be the reason for the drop off we see. Aero kits were introduced in 2015, but there wasn’t much of a change from the two years prior in terms of competitiveness. It’ll be interesting to see how the universal aero kits being introduced for the 2018 season will change the competitiveness of the series.
Competitiveness in IndyCar is an interesting topic and I’d definitely like to explore it more in the future. Finding a way to incorporate full field competitiveness is a goal of mine and I’ll be posting more about the subject in the future.

Read part two of this series on competitiveness in IndyCar here.

by Drew

Updated 11/17: Fixed an error in allocating bonus points for certain seasons. The least competitive season is now 2001 instead of 2008. The chart and article have been updated with the small change.