Jump to content
The Official Site of the Montréal Canadiens
Canadiens de Montreal

The Great Stat Discussion


Recommended Posts

  • 2 weeks later...

Okay, so I'm getting around to putting in significant work on a few fancy stat projects, but every time I get to work on one, looking at data makes more ideas pop into my head. If anyone wants to learn a thing or two about fancy stats and do some analysis on their free time, I'm more than happy to pass an idea or two off to someone else.

Things I've learned so far:

When the score is close, 5v5 save percentage has more bearing on success (points per game) than shooting percentage.

When the score isn't close, 5v5 shooting percentage is more important.

Turns out the correlation between overall PDO and success is stronger taken at 5v5 and 5v5 close, at least this year (I want to investigate a larger sample eventually).

I want to look into whether the importance of shooting and saving percentages (5v5, not 5v5 close) fluctuates with possession. My theory is that a team like, say, Chicago, won't be affected as much by a dry spell in shooting because they just shoot the puck more than everyone else. Not sure what the best way to do that is yet though, and

Link to comment
Share on other sites

I commented there, but just wanted to add here that you did a great job, and also recommend anyone with an interest in the relationship between PDO, possession, and results check this out.

I'm wondering if there will ever come a time where we can come up with a certain "base" or "normalized" PDO for each team besides assuming they will always regress to 100.

I'm thinking maybe something that looks at each player's 3 year rolling average of shooting %, shot volume/60, and 5 on 5 average TOI. And for goalies, a 3 year rolling save % and average games played per season. You'd then come up with an expected shooting perecentage and save percentage for your team, and voila: base or expected PDO.

One issue I forsee with this would be not enough historical data for rookies or sophmores (maybe the solution would be using a league average number).

Link to comment
Share on other sites

I commented there, but just wanted to add here that you did a great job, and also recommend anyone with an interest in the relationship between PDO, possession, and results check this out.

I'm wondering if there will ever come a time where we can come up with a certain "base" or "normalized" PDO for each team besides assuming they will always regress to 100.

I'm thinking maybe something that looks at each player's 3 year rolling average of shooting %, shot volume/60, and 5 on 5 average TOI. And for goalies, a 3 year rolling save % and average games played per season. You'd then come up with an expected shooting perecentage and save percentage for your team, and voila: base or expected PDO.

One issue I forsee with this would be not enough historical data for rookies or sophmores (maybe the solution would be using a league average number).

Thanks! I originally tackled PDO as part of projecting standings using nothing but fancy stats, and I wasn't really satisfied with the knowledge that was out there in the internet, so I just really started to compile as much data as I could and look at it some more, and what I found is that, while PDO is helpful, it's not nearly as much of a factor as it's made out to be.

I mean, think about it. A lot of people will say that Colorado lucked into having the season that they did, and PDO definitely supports it. But Colorado fans will argue they have a goalie with world-class potential who is just starting to put it all together, and top-end forward talent that very few teams around the league can match, so of course they have high shooting and saving percentages. On top of that, Roy puts them in a position to make their own luck, like by pulling the goalie early (or on time when he's supposed to be pulled, as a lot of stats studies suggest). Are they wrong? I'm honestly not so sure, because the only counterargument to that is "well, you had a really high PDO so you must be lucky". Is that wrong? Well, I think just pointing to a number that's greater than 100 and supposing therefore unsustainable without looking further into it is pretty foolish.

I looked at each team's regression to the mean for 5v5 PDO, and it's drastically overstated. Yes, it happens, but it's pretty slight after about the 40-game mark. Meanwhile, we have teams like Toronto, Anaheim, and New Jersey who have had pretty "extreme" PDO values for a sample size that now exceeds 100 games. To say that only luck can explain that, to me, is a really tough sell. I'm not going to stand on a soapbox and say everything we know about PDO is wrong, but I will sit here and explain why I think the general knowledge on PDO is probably different from reality, and possibly by a significant margin. We just don't know that much about it. So I want to investigate it more still, but of course I don't have the time.

I'm thinking maybe something that looks at each player's 3 year rolling average of shooting %, shot volume/60, and 5 on 5 average TOI. And for goalies, a 3 year rolling save % and average games played per season. You'd then come up with an expected shooting perecentage and save percentage for your team, and voila: base or expected PDO.

One issue I forsee with this would be not enough historical data for rookies or sophmores (maybe the solution would be using a league average number).

I really like this idea. I'll probably play around with it in the fall if nobody else has by then, but I think you're sort of layout for the general formula is a really good starting point. A few more things to consider would be coaching and player changes, and then trying to find a way to integrate free agency or trades to get updated values, although for most players the difference I imagine would be pretty minor.

And as a final note, I'm in the process of making another fanpost (pretty lengthy, but hopefully it's because the explanations are better and things are more clear this time) about special teams trends and performances. I don't dwell on fancystats too much in this one, though I will make the argument that it's time we try to use (or create) fancystats to better understand special teams performance, because I perceive that as a pretty big hole in our understanding. There are also some interesting statistical phenomena that I can't explain, so I'm excited to get that out there and see if there is somebody who can.

Link to comment
Share on other sites

Here's my next post to EOTP. This one is on special teams. Follow-up to come, hopefully by this weekend. (Link)

Great job PP, I actually understood the whole thing. Interesting that special teams have a bigger effect than PDO, and also interesting that Reffing doesn't determine games as much as people seem to think. I don't really have an answer to any of your questions, but hopefully someone else does.

Link to comment
Share on other sites

  • 3 months later...

Has anyone read anything interesting on special teams and fancy stats? That seems to be a huge hole in our understanding of the game statistically right now. Special teams are pretty much ignored and I can't figure out why. I might start working on seeing what predicts special teams performance once I get time in a few weeks.

Link to comment
Share on other sites

  • 2 weeks later...

I think I found a boundary inside which PDO regression doesn't exist, and outside of which regression would be expected to occur. That point seems to be +/- 1.3. So if a team has a PDO that is higher than 101.3 or lower than 98.7, expect something is up. If they are inside that boundary, then there is a pretty good chance that team has a higher shooting/saving percentage than normal due to being talented.

On edit: I'll write my theory as to why in more detail next week. Hopefully early next week.

Link to comment
Share on other sites

Can some one please explain to me what corsi is also fenwick and pdo,I have always had to work with some form of stats,and don't mind them too much,but I need to know what they are first.

GO HABS GO

You can think of both Corsi and Fenwick as a kind of +/- stat for shot attempts. Each time someone is on the ice and their team takes a shot (even if it misses the net or gets blocked) their Corsi number goes up one. Each time a person is on the ice and the other team takes a shot their Corsi number goes down one. So basically people use it to roughly measure possession, because if your team is shooting a lot it means that they had to have the puck on their stick a lot, too. Fenwick is the exact same as Corsi except it doesn't count blocked shots.

PDO is the team's shooting % + the team's save %. (So say the team scores on 9% of its shots and has a save percentage of 0.920, the PDO would be .09 + 0.92 =1.01). The average PDO of the whole league is 100%, and the theory is that most teams as a whole should all be pretty close to that average. If a team has a high PDO then it means that they've been getting by either by scoring on a higher percentage of shots than normal or that they've been riding a hot goaltender - both situations which aren't really good long-term solutions. So when someone says a team has a high PDO they're often saying that they've basically just been kind of lucky and that they're probably due for a fall.

Having said that, obviously there are some teams which do have better shooters or especially goaltending, meaning that they might have a higher than normal PDO but it might not be luck. Part of what PowerPlay2009 has been doing (at least I think he has, I've still got the article marked to read later :S) is trying to figure out when a high PDO number is legit because the team is actually better, and when it just represents a team getting lucky.

And also, before you ask, I have no idea what PDO stands for :P

Link to comment
Share on other sites

Guest habs1952

Frequently Asked Question No. 1 – What does PDO stand for?

You’ll have to ask it’s creator, an Edmontonian who commented at old Oilers blogs under the Internet handle “PDO”. A lot of problems people have in understanding the new hockey statistics is that they don’t stand for anything. In baseball you have OBP, ISO and WAR, which stand for on-base percentage, isolated power and wins above replacement. In the same field of hockey, you have Corsi, PDO, and Fenwick, which are all named after the people who are generally credited with inventing the modern concept.

How is it pronounced? Well, whenever I’ve mentioned it in conversation (which is more often than you might imagine) I spell it out. “Pee” “Dee” “Oh”.

Your welcome. :)

Link to comment
Share on other sites

  • 5 months later...

Why we should be trying to do better than Corsi and Fenwick

I found an interesting article from last spring while I was browsing the web this morning. The first one is here:

http://rinkstats.com/2013/12/why-popular-advanced-stats-are-bad-at/

but the one that I found more interesting is the follow-up to that article, here:

http://rinkstats.com/2014/05/corsi-and-fenwick-suck-or-why-we-should/

The long and short of it is that team Fenwick and Corsi might not have a whole lot of value when looking at them on a game-by-game basis. This isn't to say that the stats are completely worthless - for example he didn't look at all at individual player stats, or at how these stats affect winning percentages over a season.

But what the numbers here do show is that there's not a whole lot of value in doing something that I've been doing pretty frequently these last few years - looking at the Corsi or Fenwick numbers after a game and assuming that they are a good indication of how 'good' or 'bad' the team played. In fact, contrary to what I would have thought, the team that has the better Corsi percentage at the end of regulation is actually more likely to have lost the game.

Now the reason for this is the so-called "score effects"; when a team is trailing they're going to start throwing a lot more pucks at the net in the hopes that something will trickle in. The next few graphs show that this effect diminishes when you look at games that are within two goals, diminishes further when you look only at games that are within one goal, and predictably disappears when you look only at games that are tied. In all of these cases the team with the better Corsi % is indeed more likely to win, but it's still not a really big difference.

The kicker, though? In every single case, straight-up shots for and against were a better predictor than both Corsi and Fenwick. Who'd have thought it? :)

Link to comment
Share on other sites

  • 8 months later...

Greetings, friends!

I've been working on a team data set for analytics stuff for a few months, and I think it's finally acceptable enough to put out there:

https://www.dropbox.com/s/e6k5spo8rrwvmkp/NHL%20Possession%20Data%202007-15%201.3.3.xlsx

It's got almost (but not quite) all the publicly available data you might want for looking at the performance of NHL teams since 2007. If you see something missing, or think some of the data might be better presented in a different format, please don't hesitate to post and tell me about it. Also, if you spot errors I'd love to know so I can fix them.

Coming soon: CSV format files for SAS and R.

Link to comment
Share on other sites

That's a LOT of data! Kudos for the effort to gather it.

I only took a quick look for now and have one small suggestion, lock the team column so it doesn't scroll.

Easier to associate which stats go with which team when you are looking at the right-most columns.

Looks well organized. If I think of a possible reason for stat discrepancies between the 2 sites, I'll let you know. :)

Link to comment
Share on other sites

I only took a quick look for now and have one small suggestion, lock the team column so it doesn't scroll.

Easier to associate which stats go with which team when you are looking at the right-most columns.

Excellent idea. I had forgotten all about that feature. Done!

1.3.4: https://www.dropbox.com/s/dxpea4avyp4bx6q/NHL%20Possession%20Data%202007-15%201.3.4.xlsx

Link to comment
Share on other sites

After some investigation, I've decided that War-on-ice's data are better for this purpose. I downloaded the all situations, 5-on-5, 5-on-5 close, 5-on-5 tied, and 5-on-5 score adjusted data from their team data page (which is here: http://war-on-ice.com/teamtable.html), and stuck it in the sheet. So, welcome to version 2.0:

https://www.dropbox.com/s/npszk5f8juvtca6/NHL%20Possession%20Data%202007-15%202.0.0.xlsx

War-on-ice includes the underlying figures for possession calculations (missed/blocked shots). I fixed the TOI numbers, and also added scoring chances just in case anyone's interested in that. All time columns are now in minutes instead of the unwieldy MM:SS format which was impossible to do calculations with in Excel without extra work.

Link to comment
Share on other sites

  • 3 weeks later...
  • 1 month later...

Unless I'm misunderstanding something ... this is our season so far?

Habs GF = 3.19 GA = 2.23

3.19 * 3.19 / [(3.19 * 3.19) + (2.23 * 2.23)]

= 10.1761 / (10.1761 + 4.9729)

= 10.1761 / 15.149

= .6717 (expected point percentage)

.6935 is our actual point percentage, so the difference (or Delta P%) is .0218?

So our actual point percentage (for the season) is ~.022 (or 2.2%) above expectations in relation to our goal production/prevention?

I've usually seen it done with GF/GA totals not per game, so it would be 1002 / (1002 + 702), or .671 (close enough! :lol:). So we're .022 above that. Quite a small difference indeed.

(I'm also curious as to why it is termed as Pythagorean while applying a square to the variables appears rather arbitrary, so maybe that's why I'm not getting it. Pythagorean implies a direct correlation between 3 squared values in a triangle, I don't see a relationship between the values here. You can't determine one of these values based upon the others, and the actual formula is 4 values, where x = GF^/(GF^ + GA^).

IIRC, it's called that just because it kinda looks like the Pythagorean equation.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...