One interesting topic I’ve wanted to tackle for some time is forecasting the IndyCar championship. Who is most likely to win the championship at any point in the season? Further, what is the probability of a driver finishing in a specific position in the championship?
To answer this question, we need to start with the current points standings. Next, we need a way to simulate the remaining races of the season over and over again to see how these simulated seasons play out. In order to do this, I use a driver’s past race results (finishing position, average track position, and average track position in the last 25% of the race) to get a base for both their performance and consistency. These results form a unique distribution for each driver and their performance is used to simulate each of the remaining races. Based on a random draw from each driver’s distribution, the race is ordered and points are awarded according to IndyCar’s points system. At the end of the season, the total points a driver has earned are tallied up and the championship standings are determined. This simulation process is repeated for 50,000 seasons, and the number of times Driver X finishes in Y place in the championship in those 50,000 seasons is the forecast probability for that position and driver combo. This 50,000 number wasn’t chosen for a specific reason other than that it showed convergence when repeated.
This method falls under the category of a Monte Carlo simulation.
At the beginning of the season, we obviously have less reliable data about a driver than we do when we’re halfway through or heading into the finale. The model accounts for this by using the information in a driver’s distribution for a race prediction in proportion to how much of the season is complete. In essence, towards the end of the season, we know a ton about the driver so we almost completely trust their past performances as the truth. But after just one race, while we use the driver’s first race result as input, we think of the drivers as mostly a field of similar drivers who are all about average. Right now, past season performances aren’t included in the model because there is not a good way to predict how a driver or car has changed since the end of last season.
The simulation also accounts for the fact that a driver might perform in a way that he or she hasn’t before. Once again, this is in proportion to how much of the season is completed at the time of the forecast. Some drivers are also more inconsistent or consistent than others, so that also influences the likelihood that a driver performs drastically different than their previous results.
Drivers are not penalized for DNFs since they are mostly random occurrences and not predictable. DNFs are not included in a driver’s distribution, even if it was their fault.
Double points races are accounted for in the season simulations.
The current forecast for the IndyCar championship can be found here.