Jump to content
The Official Site of the Montréal Canadiens
Canadiens de Montreal

The Great Stat Discussion


Recommended Posts

I've usually seen it done with GF/GA totals not per game, so it would be 1002 / (1002 + 702), or .671 (close enough! :lol:). So we're .022 above that. Quite a small difference indeed.

IIRC, it's called that just because it kinda looks like the Pythagorean equation.

Ok, so I do understand the concept. It definitely needs a better name.

I still don't get the reasoning for applying a square to the values. For example ...

100/170 = .588

10000/14900 = .671 (squared)

1000000/1343000 = .745 (cubed)

Why is the squared value more valid than the others? If that is supposed to represent the fact that games are worth 2 points somehow, I'm going to need some explanation as to how exactly that works.

I would say something more accurate (and easier to understand and apply), would be to determine an average standings point value per goal and factor that into the equation.

Something like ... EPP (Expected Point Percentage) = GF/(GF + GA) * SPV (Standings Point Value)

*NOTE: For the record our GF are 99 and GA are 69 (The other goals are from shootout wins and losses, NHL has them displayed as actual goals scored, discluding the "fake" goals).

Link to comment
Share on other sites

Ok, so I do understand the concept. It definitely needs a better name.

I still don't get the reasoning for applying a square to the values. For example ...

Why is the squared value more valid than the others? If that is supposed to represent the fact that games are worth 2 points somehow, I'm going to need some explanation as to how exactly that works.

The annoyingly literal answer is "because that's what Bill James did with baseball". It has to do with independence and the influence of luck and randomness. Basketball's exponent is much larger than 2 because of this. People have done very interesting work on making it work properly for hockey ( http://www.hockeyanalytics.com/Research_files/Win_Probabilities.pdf and http://www.hockeyanalytics.com/Research_files/DayaratnaMiller_HockeyFinal.pdf are are good examples, I've read more but I can't find them at the moment), but they've got various problems themselves.

A proper NHL exponent will account for the possibility and probability of shootouts and overtime. Notice what the league average point percentage has been over the last few years. Due to three point games and shootouts, .500 (and an even GD) isn't average, it's below average. I'm not even sure where to start right now, to be honest, since this is the first season with 3-on-3 and greatly reduced probability of reaching a shootout.

*NOTE: For the record our GF are 99 and GA are 69 (The other goals are from shootout wins and losses, NHL has them displayed as actual goals scored, discluding the "fake" goals).

Wow, rookie mistake on my part. :huh: The one time I forget never to use NHL.com :P

Link to comment
Share on other sites

The annoyingly literal answer is "because that's what Bill James did with baseball". It has to do with independence and the influence of luck and randomness. Basketball's exponent is much larger than 2 because of this. People have done very interesting work on making it work properly for hockey ( http://www.hockeyanalytics.com/Research_files/Win_Probabilities.pdf and http://www.hockeyanalytics.com/Research_files/DayaratnaMiller_HockeyFinal.pdf are are good examples, I've read more but I can't find them at the moment), but they've got various problems themselves.

A proper NHL exponent will account for the possibility and probability of shootouts and overtime. Notice what the league average point percentage has been over the last few years. Due to three point games and shootouts, .500 (and an even GD) isn't average, it's below average. I'm not even sure where to start right now, to be honest, since this is the first season with 3-on-3 and greatly reduced probability of reaching a shootout.

Wow, rookie mistake on my part. :huh: The one time I forget never to use NHL.com :P

Will have to put the rest aside for now, but I did figure out why the values were squared.

Basically, higher quantities of either goal value, results in higher/lower probability of points.

(A team scoring 5 goals/game has a higher probability of achieving points than a team scoring 3 goals/game)

(A team allowing 1 goal/game has a higher probability of achieving points that a team allowing 3 goals/game)

Squaring the values is a (somewhat crude) method of reflecting that.

Thanks for the links. Should make for some interesting reading over the next few years. ;)

Link to comment
Share on other sites

I'm about a third of the way through writing some stuff in R to generate snazzy Corsi/Fenwick graphs. There's nothing wrong with Natural Stat Trick (other than being run by a Senators fan :P), but I figured it would be fun. And so far, it has been! If anyone wants the R code, PM me. It's horribly messy and doesn't do much yet, but sharing is good.

Link to comment
Share on other sites

Unhelpfully, it appears that War-on-ice, Natural Stat Trick, and Hockey Analysis all use different methods of determining time on ice. NST seemingly uses the NHL.com shift charts, and WOI is doing something over my head. Preliminarily, I'm summing the event seconds in the raw NHL real-time data. This is leading to a not insignificant difference. Looking at the Minnesota game:

NHL.com says P.K. had 29:02 of ice time (29.03 minutes).

WOI says 29.6 minutes (~29:36).

My count says 29.65 minutes (~29:39).

Because I'm not sure how I can do better at the moment, I'm just going to use what I'm using right now. If anyone has any ideas or advice, I'm all ears.

Link to comment
Share on other sites

Unhelpfully, it appears that War-on-ice, Natural Stat Trick, and Hockey Analysis all use different methods of determining time on ice. NST seemingly uses the NHL.com shift charts, and WOI is doing something over my head. Preliminarily, I'm summing the event seconds in the raw NHL real-time data. This is leading to a not insignificant difference. Looking at the Minnesota game:

NHL.com says P.K. had 29:02 of ice time (29.03 minutes).

WOI says 29.6 minutes (~29:36).

My count says 29.65 minutes (~29:39).

Because I'm not sure how I can do better at the moment, I'm just going to use what I'm using right now. If anyone has any ideas or advice, I'm all ears.

Aren't the NHL.com shift/TOI charts done by the actual stat keepers at the game?

If you aren't sure which is more accurate, perhaps try totalling the ice time for all defensemen for a game and see. It would have to be a game you watched, where you know whether a team elected to go with 1 defense on the PP or PK (or for some other reason). Figure out how much time a team should have had to split amongst the defense, then see which TOI comes closest.

Link to comment
Share on other sites

Aren't the NHL.com shift/TOI charts done by the actual stat keepers at the game?

If you aren't sure which is more accurate, perhaps try totalling the ice time for all defensemen for a game and see. It would have to be a game you watched, where you know whether a team elected to go with 1 defense on the PP or PK (or for some other reason). Figure out how much time a team should have had to split amongst the defense, then see which TOI comes closest.

Yeah, the NHL.com TOI data is directly from the arena crew, but so is the event log. I'm going to look into testing the likely inaccuracy of this when I can, but I fear it'll take far too much time watching and meticulously logging a representative sample of games (and ideally, with a few other people doing it, too).

I'm tempted to just use the event timing, because I'm lazy and don't feel like writing something to parse the NHL.com TOI page there's a whole lot of events for every team in every arena, so at least it's wrong for everyone in a quasi-similar fashion. This is somewhat risky territory to be conjecturing in with zero evidence, however.

Link to comment
Share on other sites

  • 2 weeks later...

Curiously, what is the mean and standard deviation for the PDO statistic? The mean, I assume is right at 1.

The mean between 2007 and today is 100.0052. 2007-15 is 100.0017. 82 game seasons, 99.9981.

Between 2007 and the season in progress, the SD is 1.179554. 2007-15, it's 1.147745. Only counting complete 82 game seasons since 2007, it's 1.073516.

Link to comment
Share on other sites

The mean between 2007 and today is 100.0052. 2007-15 is 100.0017. 82 game seasons, 99.9981.

Between 2007 and the season in progress, the SD is 1.179554. 2007-15, it's 1.147745. Only counting complete 82 game seasons since 2007, it's 1.073516.

Thank you, kind sir! :)

So, a consistently high PDO means you're either A] extremely lucky, or B] extremely good? I guess most teams should take note that they will gravitate towards the mean, but a team that can maintain a consistently high PDO (say, over an entire season) must have exceptional players.

Link to comment
Share on other sites

So, a consistently high PDO means you're either A] extremely lucky, or B] extremely good? I guess most teams should take note that they will gravitate towards the mean, but a team that can maintain a consistently high PDO (say, over an entire season) must have exceptional players.

Our very own powerplay2009 has done some work on this exact thing:

http://www.habseyesontheprize.com/2014/11/19/7245133/understanding-pdo-all-teams-are-not-created-equal

http://www.habseyesontheprize.com/2014/5/6/5687670/does-the-importance-of-pdo-vary-with-possession

High PDO with good possession performance will probably not result in as significant regression as it "should", or people would expect. High PDO with abysmal possession performance will likely last long enough to give the hockey media the opportunity to write a bunch of carbon-copy "good story coming into training camp" stories. High PDO with mediocre possession performance can end at any minute, and it'll be very, very ugly.

Link to comment
Share on other sites

Our very own powerplay2009 has done some work on this exact thing:

http://www.habseyesontheprize.com/2014/11/19/7245133/understanding-pdo-all-teams-are-not-created-equal

http://www.habseyesontheprize.com/2014/5/6/5687670/does-the-importance-of-pdo-vary-with-possession

High PDO with good possession performance will probably not result in as significant regression as it "should", or people would expect. High PDO with abysmal possession performance will likely last long enough to give the hockey media the opportunity to write a bunch of carbon-copy "good story coming into training camp" stories. High PDO with mediocre possession performance can end at any minute, and it'll be very, very ugly.

Thanks 93. I'm going to have a gander at these articles now!

Link to comment
Share on other sites

Also, do people subscribe more to Corsi or Fenwick on here? Does one correlate to success in the standings better than the other?

This is tricky. There's arguments for both: Some people think blocking shots is significantly influenced by the skill of the opponent blocking the shot, thus counting it in possession metrics is potentially colouring a team/individual's data too much. Others think the additional events counted in Corsi make it more useful. But the important things apply to either:

Look at 5-on-5, not all situations.

When possible, adjust for score effects.

Pay no attention to "Close" or "Tied" data, as they're unacceptably limited and worse than 5-on-5. I arrived late in the game on this one, regrettably.

Here's a fantastic article from one of the best in the business:

http://hockey-graphs.com/2014/11/13/adjusted-possession-measures/

In his work there, Corsi was superior. I post 5-on-5 Fenwick percentages after every period because I don't currently have score adjustment in my code. This is, in all honesty, a lazy excuse for indulging in habit and actually likely less accurate. So I should change that. When/if I get score adjustment working, I'll definitely start posting that instead.

Link to comment
Share on other sites

I prefer Corsi myself the majority of the time. When you remove blocked shots, as for Fenwick, you have to take into account whether one team is good at blocking shots, or one team is bad at getting shots through. Not something you can reasonably take into consideration without deeper analysis.

If you compare Corsi to SOG, you can determine whether one team did a good job at preventing pucks on net or one did a poor job at directing pucks on net.

Fenwick lies somewhere in between, whereas I find you can derive more from Corsi or SOG both individually and combined. Fenwick tries to accomplish in one statistic, what a combination of Corsi and SOG do a better job of, imo.

For example, you can use Corsi and SOG, add or remove missed and blocked shots, and get several different permutations. Fenwick is simple a name for one of those permutations.

Corsi minus SOG

Corsi minus SOG minus blocked shots

Corsi minus SOG minus missed shots

Corsi minus blocked shots (Fenwick)

Corsi minus missed shots

SOG plus missed shots (Fenwick)

SOG plus blocked shots

SOG plus blocked shots plus missed shots (Corsi)

As for which statistic(s) most correlate to standings, that would require some research and comparison over several seasons, but I would "think" the lowest Corsi to SOG ratio would prove to be the strongest indicator. Basically, the team that converts the highest percentage of their shot attempts into actual shots on goal, "should" have the most success, or more success than teams who miss nets or have a high portion of their shot attempts blocked.

All other things equal, logical assumptions would be that:

1. A team with 20 Corsi events and 15 SOG, should have more success than a team with 25 Corsi events and 10 shots on goal. High Corsi does not necessarily mean more SOG. So a team with lower Corsi, but more SOG, is more efficient. They have a higher conversion rate, and should therefore have more success. This would suggest that while one team is better at generating shot attempts, the other team is better at getting the puck on net.

2. A team with 20 Corsi events and 16 SOG, should have more success than a team with 25 Corsi events and 16 shots on goal. Again, despite both teams having the same number of shots on goal, the team that is converting the higher percentage of their attempts into shots on goal, should have more success over time. While both teams are generating the same number of shots on goal, if you pro-rate the number of Corsi events, the team with the higher conversion rate of Corsi to SOG, will eventually overtake the other team.

Team A with 10 Corsi would have 8 SOG. With 40 Corsi, they would have 32 SOG.

Team B with 10 Corsi would have 6.4 SOG. With 40 Corsi, they would have 25.6 SOG.

Link to comment
Share on other sites

Taking a cursory look at last season, 5-on-5 Corsi/SOG ratio's Pearson correlation coefficient to point percentage is -0.0033, and correlation to 5-on-5 goals for is -0.045. There appears to be more noise than information there, which makes a bit of sense. The miss-block-shot-goal envelope is heavily influenced by randomness for everyone, the sustainably better teams just do it more.

Link to comment
Share on other sites

Taking a cursory look at last season, 5-on-5 Corsi/SOG ratio's Pearson correlation coefficient to point percentage is -0.0033, and correlation to 5-on-5 goals for is -0.045. There appears to be more noise than information there, which makes a bit of sense. The miss-block-shot-goal envelope is heavily influenced by randomness for everyone, the sustainably better teams just do it more.

I would guess that every team generally encounters similar percentages of missed and blocked shots, so that might make sense. There would probably be low correlation league-wide. If there were some outliers, they would probably be identifiable as being significantly above average in shot selection or blocking. So you still might find a correlation for individual team success over periods of high or low Corsi/SOG ratios, but they would likely regress to the mean over time, unless of course, a team was able to sustain a high ratio for or low ratio against over the course of a season due to system or coaching factors.

Our December for example, (without looking), I feel like our Corsi to SOG ratio would show as poor. We were getting lots of opportunities, but the shot selection and missed nets "felt" abundant (I would say the shot quality was also generally poor, but that's more difficult to analyze).

Link to comment
Share on other sites

I would guess that every team generally encounters similar percentages of missed and blocked shots, so that might make sense. There would probably be low correlation league-wide. If there were some outliers, they would probably be identifiable as being significantly above average in shot selection or blocking. So you still might find a correlation for individual team success over periods of high or low Corsi/SOG ratios, but they would likely regress to the mean over time, unless of course, a team was able to sustain a high ratio for or low ratio against over the course of a season due to system or coaching factors.

Our December for example, (without looking), I feel like our Corsi to SOG ratio would show as poor. We were getting lots of opportunities, but the shot selection and missed nets "felt" abundant (I would say the shot quality was also generally poor, but that's more difficult to analyze).

Here's the top and bottom 10 for it, since 2007-08, with thanks to War-on-ice for the season tables:

Chicago 2007-08—1.649662 Corsi to SOG, .537 point %, 9th in West

Colorado 2007-08—1.683151 Corsi to SOG, .579 point %, 6th in West

Chicago 2008-09—1.695411 Corsi to SOG, .634 point %, 3rd in West

Columbus 2009-10—1.713287 Corsi to SOG, .482 point %, 14th in West

St. Louis 2007-08—1.72299 Corsi to SOG, .482 point %, 14th in West

Buffalo 2007-08—1.724138 Corsi to SOG, .549 point %, 10th in East

Phoenix 2007-08—1.728587 Corsi to SOG, .506 point %, 12th in West

Colorado 2009-10—1.729227 Corsi to SOG, .579 point %, 8th in East

Columbus 2007-08—1.729634 Corsi to SOG, .488 point %, 13th in West

New Jersey 2009-10—1.731992 Corsi to SOG, .628 point %, 2nd in East

Toronto 2010-11—2.012429 Corsi to SOG, .518 point %, 10th in East

San Jose 2014-15—2.010183 Corsi to SOG, .543 point %, 13th in West

Toronto 2011-12—2.005467 Corsi to SOG, .488 point %, 13th in East

Dallas 2014-15—1.997484 Corsi to SOG, .561 point %, 10th in West

Carolina 2010-11—1.994609 Corsi to SOG, .555 point %, 9th in East

Washington 2014-15—1.987439 Corsi to SOG, .616 point %, 5th in East

Los Angeles 2010-11—1.979911 Corsi to SOG, .598 point %, 7th in West

Dallas 2010-11—1.973269 Corsi to SOG, .579 point %, 9th in West

Calgary 2014-15—1.965922 Corsi to SOG, .591 point % 8th in West

Montreal 2009-10—1.961739 Corsi to SOG, .537 point %, 8th in East

Ours in December was 1.870712. Compare that to league average of this since 2007-08, 1.848398.

I think what you're conceptually looking for here is a scoring chance metric; War-on-ice has one of their own (see a blog entry here: http://blog.war-on-ice.com/new-defining-scoring-chances/), other people have theirs, and so on. The NHL RTSS data which the War-on-ice nhlscrapr package gathers includes location and shot type data, but I'm not familiar enough with R yet to get deep into it.

BTW, if I haven't mentioned it yet, you (and anyone interested in this!) should definitely get R. There's a really friendly integrated development environment called RStudio, which is free and available on the most commonly used platforms: https://www.rstudio.com/I'll be happy to help with getting the NHL package setup and showing some of the basic functions.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...