Pages

Sunday, November 9, 2014

3-point shooting percentage projection model

During the application/interview process with the Philadelphia 76ers front office, I was presented the project of predicting three-point shooting percentages for all NBA players this season.  From my general awareness of statistical projection systems for baseball, the basis of the model would be to use past historical data to estimate a player’s true skill level.  However, there are additional factors that could influence a player’s percentages.  For example, a player’s true skill level can evolve over time, and while the direction and extent of that change may vary significantly by the player, there could be some generic trend evident across basketball.  In addition, a player’s shooting percentages can depend heavily on the intra-game context of his shots, such as the location of the shots (distance or location along the arc), how open he is, and whether the shots are off-the-dribble or catch-and-shoot.  Furthermore, there may be subtle inter-game influences such as whether the games occur at home or on the road and how much travel and rest time the player has had.  Of the many different variables that could in theory impact three-point shooting percentages, many of them are either themselves unknown or their average effects may be determined to be minimal.  As a result, the goal of this project was to build the model foundation that can predict three-point shooting percentages on its own and that can be extended in the future to include additional variables.

Thursday, June 5, 2014

2014 NBA Playoffs Finals Preview

jQuery UI Accordion - Default functionality We're adding smaller and smaller sample sizes as the playoffs progress, so my overall results won't change much.  I am now 28-21 overall after going 3-2 in the Conference Finals:

SAS 5 times: 3-2-0

Wednesday, June 4, 2014

2014 NBA Draft Big Board 1.0

This is the first version of my attempt at creating a 2014 NBA Draft Big Board. First, here are some of its guiding principles:

Sunday, May 18, 2014

2014 NBA Playoffs Conference Finals Preview

jQuery UI Accordion - Default functionality After an excellent first round, my model had a much less successful second round, as I was 5-5-0 against the spread using my model with subjective adjustments, making me 25-19 for the entire playoffs.  Here are the full results:

LAC 6 times: 2-4-0
WAS 3 times: 3-1-0

Granted, this was an extremely small sample size, and adding my second-round results to my first-round results doesn't significantly increase the likelihood a coin-flip strategy would match my record.  The bigger issue is the 2-4 record betting on the Clippers given the confidence I had in that bet.  Specifically, a 57%-weighted coin would be just as likely to finish with a record as poor as 2-4 as a fair coin would be to finish with a record as good as 25-19.

Monday, May 5, 2014

2014 NBA Playoffs 2nd Round Preview

jQuery UI Accordion - Default functionality There are many potential areas of improvement in both the model and the testing of the model, most importantly perhaps a means of adjusting for various series states (i.e. how should the line be adjusted when the home team is down 1-0), but in the still small sample size of the first round, I was 20-14-1 against the spread using my model with subjective adjustments.  While that's a 59% win percentage (which would be amazing if it were indeed my true win percentage), the small sample size means that this likely isn't my true win percentage, as even a coin flip would still finish with the same record or better in 34 trials 20% of the time.  Here are the full results:

ATL 6 times: 3-2-1
MIA 3 times: 2-1-0
BKN 7 times: 4-3-0
CHI 2 tims: 1-1-0
DAL 1 time: 1-0-0
SAS 1 time: 0-1-0
MEM 7 times : 4-3-0
GSW 7 times: 5-2-0
HOU 1 time: 0-1-0

Saturday, April 19, 2014

2014 NBA Playoffs 1st Round Preview

jQuery UI Accordion - Default functionality This is my first attempt at building a simple model for estimating NBA playoff series.  The idea was inspired by colts18 on the APBR forums.  The model is based off of the xRAPM numbers on ESPN (where it's called "real plus minus"), which are supposedly the most predictive of the all-in-one metrics for NBA games.  I utilized my own minutes estimates for each series (subjectively based on regular season minutes, an increase  for each team's top players in the playoffs, and potential matchup adjustments I expect each coach to employ) to calculate each team's offensive and defensive rating (the league average points per possession is set to 104 by weighting each team's offensive/defensive rating by each team's pace).  Then, I assumed 100 possessions per game (which is most definitely not true) and a home court advantage of four points (divided evenly between offensive and defense, so 1 point per team per side of the ball) to calculate each team's Pythagorean win percentage at home and on the road, and used Bill James' Log 5 formula for estimating a matchup based on each team's win%.  Finally, assuming games are independent (which, again, is also not true), each permutation was considered to calculate a team's series win percentage.  All of the computation was done in R and the script I ran will be included as well.  I will then comment on any other subjective observations for each series.

Tuesday, April 15, 2014

Tanking in the NBA

People disagree about the significance of the tanking problem in the NBA, but no one doubts that it exists.  Most of the media coverage on tanking has focused only on the race for draft lottery ping-pong balls that was especially evident this year, given the expected strength of the incoming draft class and the projected gap between the top teams and the bottom teams before the season even began.  This kind of tanking can manifest in many different forms and degrees, with some front offices actively trading away productive players (Boston trading Pierce, Garnett, Lee, and Crawford or Philadelphia trading Turner and Hawes), others benching players towards the end of the year citing bogus injuries (Milwaukee holding Sanders out until it was beneficial to medically clear him to start his marijuana suspension), and others simply making no effort to improve the team at any point in the season (Philadelphia not bothering to reach the salary floor or Utah trading for Jefferson and Biedrins to reach the salary floor).  Still, this might not even be the most egregious manner by which teams actively trying to lose games, as many of these draft lottery tankers initially tried to compete and arguably only Philadelphia, Utah, and Boston stuck to season-long losing blueprints.  There are two rules that even more directly incentive teams to intentionally lose, and each of these is more easily fixable.

Wednesday, March 20, 2013

How to Mathematically Win March Madness

No, this is not a strategy for how to actually win the national championship, as if I had this strategy, I would not be at home writing about it.  However, given that an estimated $2.5 billion is gambled on the tournament, the real winners of March Madness aren't necessarily the National Champions but rather the people who win their bracket pools.  The most significant source of value added in filling out a bracket is actual basketball knowledge, so for example, if someone could predict every matchup with over 80% accuracy, he wouldn't need any other strategies.  But for normal people, here are a few tips to keep in mind:


1) Predicting the champion correctly is paramount
The degree of importance obviously depends on the scoring system, but I have yet to see a system that doesn't increase point totals in each round.  Using the default system of 1 point for each correct first-round game, 2 points for each correct second-round game,..., and 32 points for the correct champion, a correctly picked champion already nets 63 out of a total 192 points (the champion has to win each of its games); by comparison, correctly picking every first-round and second-round game (a total of 48 games) only nets 64 points.

2) Matchups are not binary
This means that if one were simply trying to maximize expected point total, the final four should almost always consist of high seeds, even if they aren't necessarily the best teams in their respective regions.  This is simply because they usually have the easiest expected early round opponents.  So even if we think an 8 seed would likely beat a 1 seed (either because of matchup issues or simply because the 1 seed is overseeded while the 8 seed is underseeded), choosing the 8 seed to advance for this reason alone isn't wise as it's much less likely to even advance past the first round and make the 1-8 matchup.  Pulling arbitrary numbers out of my hat, that would mean the 8 seed would need to be something like 75% favorites against the 9 seed and 60% favorites against the 1 seed as the 1 seed is almost always close to 100% against the 16 seed.  In terms of actually executable bracket picking strategy, given Tip #1, one should attempt to work backwards by picking the champion, then the National Championship Game, and then the Final Four, and so on.  This may mean that there are matchups in one's bracket in which the winner chosen isn't necessarily the team more likely to win that specific matchup, which is something that usually befuddles network "analysts."

3) In any pools of significant size, choose the team that is the most undervalued by the public and not necessarily the team that is the most likely to win
In the most simplified case, consider a 2 team tournament between Team A and Team B and a pool with n brackets (where n is a large number).  Team A is a 67% favorite over Team B, and 90% of people pick Team A over Team B.  Assuming a sensible payout structure (i.e. the money is split between the tying brackets or one of the tying brackets is chosen at random to win the payout), one's EV for picking Team A is X*67%/(90%*n) = 0.6*X/n, where X is the monetary size of the pool, while one's EV for picking Team B is X*33%/(10%*n) = 3.3*X/n.  Adding more rounds definitely complicates issues, as there is now more than one path to win a pool (there isn't one specific game that determines who wins).  However, in the case of extremely large pools in which one can reasonably assume that each team in the field is picked to win by at least one bracket, the champion likely must be picked correctly for a bracket to have a chance at winning.  This basically means that if one were trying to win the challenge for best bracket that both Yahoo! and ESPN offer, one should pick a team that exhibits the largest discrepancy between how often the team is chosen as a champion and the team's actual championship chances.

Now, in smaller office pools, the strategy might be significantly different and this is because as mentioned earlier, there can be multiple potential paths to winning the pool.  At the most extreme case, consider a 2 person office pool.  Using the same 2 team tournament example from before,  one's EV for picking Team A is X*67%*10%+90%/2 = 0.517*X while one's EV for picking Team B is X*33%*90%+10%/2 = 0.35*X.  As a result, in a 2 person pool, one should always choose the most likely outcome, while in an infinite person pool, one should always choose the outcome that is most undervalued (defined simply as % chance/% chosen), and everything in between involves some balance between the two.


Will following any of these tips guarantee a win?  Of course not.  Similar to how end of game coaching won't matter if players simply don't hit shots, making strategic adjustments won't matter if someone simply adds no value in predicting matchups.  However, there's no reason to sacrifice any EV if one doesn't need to, and each addition at the margin can add up.  Good luck bracketing.