How successful has this been? How would you even quantify that?
New to this section of HF
Great question - ultimately, the way to quantify or test this is to look at actual/predicted ratios for different cohorts.
For instance, if the algorithm predicts a certain percentage of three-point games, for which teams is it close? For which teams is it off? Why?
These are the team-level back-testing that I've been posting on here since about mid-January:
Team
|
Pts Predicted
|
Pts Achieved
ANA|36|51
STL|32|42
PIT|38|47
PHI|36|41
NAS|38|43
CAR|32|36
NYR|38|42
CBS|29|32
BUF|33|35
COL|33|35
NYI|37|39
SJS|43|44
DAL|39|40
TBL|40|41
NJD|33|33
WAS|44|43
LAK|40|38
DET|35|33
MIN|37|35
BOS|39|37
OTT|36|33
FLO|41|38
TOR|33|28
ARI|31|26
CHI|36|29
MTL|36|28
WIN|31|24
CAL|35|27
EDM|32|24
VAN|31|21
This is sorted by (actual-predicted)/predicted, so the teams that are overachieving (per the algorithm) are at the top, and the underperformers are at the bottom.
I think it's fairly obvious one thing that's going on - I won't spoil it, but one common enhancement that people make to SRS algorithms is to add in a recency bias - make more recent games count more. The upside is that more recent games are probably more predictive than further back games (there's a minor downside that recency biases can lead to models that overreact to everything).
On the whole, recency biases have helped my predictions to a fair degree.