The first weekend is over. The filled brackets are decorated with circles and slashes, if they have not been torn up yet. For college basketball fans, March Madness may be the most wonderful time of the year.
Unfortunately for those who follow St. John’s basketball, this time of year, especially this year, leaves us filled with wonder, what-ifs, and what should-have-beens.
Part of the tournament watching experience has included, and will inevitably continue to include, thoughts that the Red Storm are better than that team or that St. John’s would have fared better than them. As it turns out, St. John’s would probably beat a quarter of the teams that are in the field.
Using regression analysis, Rumble in the Garden has created a forecasting model that blends data from ESPN’s BPI, Jeff Sagarin’s “predictor” ratings, and Ken Pomeroy’s ratings, Rumble in the Garden presents the “What Could Have Been” Index applied to the beloved St. John’s Red Storm.
How It Works
Each tournament team’s raw BPI was pulled into a table as well as ESPN’s win probability for all of the first round teams. From there, the difference in BPI was determined for both teams in each contest.
After calculating each match-up’s difference, a regression model is created to determine how much a marginal increase in the difference of BPI impacts win probability. With an R-Squared (a statistical method that explains how much of the variability of a factor can be caused or explained by its relationship to another factor) of 0.95, and an Significance F (a method of determining the statistically significant predictive capability of the model) result of 2.60271E-43, it can be determined that the results of the model are both predictive and statistically significant. More specifically, the difference in BPI can predict the BPI win probability given by ESPN. For example, Seton Hall has BPI of 11.5 compared to NC State’s 9.2. The difference in BPI is what leads to Seton Hall’s 59.9% win probability using just BPI. The smaller the difference, the less predictable the winner will be.
Next requires plugging in St. John’s BPI (6.2) to find the difference against every team. The difference with Seton Hall is -5.3 if the goal is to predict St. John’s win probability. Plugging that difference into the model above, it is determined that this model gives St. John’s a 33.41% chance of beating Seton Hall today on a neutral floor.
Jeff Sagarin’s “Predictor” Ratings
The process for this is similar to the BPI process above, but with a couple extra steps. The way Sagarin’s Ratings works is different in that when the difference in ratings are calculated, and extra points are added for home-court advantage, the difference represents the projected point spread. For the St. John’s against Seton Hall example, Seton Hall has a rating of 85.37 and St. John’s has a rating of 80.23. Without a home-court advantage to factor in, Seton Hall should be favored by 5.14. With the spreads being St. John’s +11.5 on December 31 at Seton Hall and St. John’s -2.5 on February 24 at St. John’s, a +5.14 for St. John’s makes sense on a neutral floor.
Because the spread itself is not a predictive percentage, it is necessary to convert the spread. Using a conversion table created by BoydsBets, a regression model was created to determine win probability based on the spread. With an R-Squared of 0.998819676 and a Significance F of 1.17913E-36, the model was considered both predictive and statistically significant. To reiterate, the spread for each team was predictive of the win probability determined by BoydsBets.
With St. John’s being a +5.14 against Seton Hall today on a neutral floor, the model based on Sagarin’s Ratings and the table from BoydsBets shows that St. John’s has a 31.40% of beating Seton Hall under the given conditions.
The BPI analysis from above is applied for the KenPom ratings. The difference in KenPom rating is calculated for each match-up in the first round of the tournament. Those differences are plugged into a model with the KenPom win probabilities for each game. With an R-Squared of 0.965865293 and a Significance F of 3.47038E-47, the difference in KenPom ratings effectively project what the KenPom win probability will be.
With St. John’s having a KenPom Rating of +10.08 compared to Seton Hall’s +17.33, the difference of -7.25 for St. John’s projects that St. John’s has a 37.33% of beating Seton Hall today on a neutral floor.
Each of these predictive forecasts were combined into a unweighted average, resulting in the “What Could Have Been” Index projecting that St. John’ would have a 34.05% chance of beating Seton Hall today on a neutral floor.
While is the models themselves are both predictive and statistically significant, there are notable flaws.
First, each model is made up of a small sample size. Each model is made up of only 25 to 64 samples because they only include the projections for the NCAA Tournament. While a future model may include how the ratings projected in previously played games, the current limitation is accepted assuming that all of the previous games impacted the accuracy of the current ratings.
Second, it does not take any travel or injury into account. All of the ratings are based on wins and losses, strength of schedule, and the final score, however they are calculated in different ways. The remaining factors that affects teams such as fatigue, injuries, academic schedules, or weather are not baked into the ratings, nor the “What If” Index. As a result, all neutral courts are treated the same. In this model, the floor in Boise, Idaho, where Kentucky played, is treated the same as the floor in Pittsburgh, Pennsylvania, where Villanova played.
Finally, the input data is accurate as on Tuesday, March 13, prior to the start of the First Four. As such, when the games are played, the ratings for each sources moves slightly. The index retained its pre-Tournament ranking in the interest of observing the unpredictable chaos that has and will continue to occur in single elimination playoff formats.
What Could Have Been Index: St. John’s Win Probability Against Tournament Teams
|Cal St. Fullerton||75%|
|New Mexico St.||48%|
|North Carolina Central||93%|
|North Carolina St.||41%|
|San Diego St.||43%|
|South Dakota St.||53%|
|Stephen F. Austin||63%|
Reactions to the Model
If St. John’s had won the Big East Tournament, without calculating the rating differences that would have resulted from the extra wins, the Red Storm would not look to be in bad shape in the NCAA Tournament.
If the Red Storm replaced New Mexico State as the 12 seed and played Clemson in the Midwest, they would have an 11% chance of making it to the Sweet 16 by beating Clemson and the winner of Auburn-Charleston. Even against the top four seeds, on a neutral floor today, St. John’s would have a 5% chance against Virginia, a 5% chance against Villanova, a 17% chance against Kansas, and a 24% chance against Xavier. However...
The specific circumstances, winning the Big East Tournament and playing as the Midwest 12 seed, that could have gotten St. John’s to the Sweet 16 are incredibly specific. St. John’s would only likely be favored in 19 of these potential 68 match-ups (Murray St. was a 49.7% that rounded up).
Having seen the Red Storm win over the likes of Villanova, Duke, and Butler and regularly hang in there with Xavier and Creighton, my gut feeling would be that St. John’s would have been favored in almost 30 games.
However, the inconsistent play and the seemingly all too often blow-outs likely set the Red Storm back in how the analytics suggest they would match-up.
The model shows that St. John’s is good enough to beat a handful of teams in the tournament this year. Having watched the games, St. John’s can be good enough to stay in games that the analytics suggest they have no business being in.
St. John’s has shown me they can be good enough. They have even hinted it to the analytics. Now they just need to get to the tournament to show the rest of the world.