Jump to content
The Official Site of the Montréal Canadiens
Canadiens de Montreal

The Great Stat Discussion


Recommended Posts

^

See, if we could assign a value of quality to those two Corsi events by weighting them, that center ice shot would be like a 0.1 event, while the other could be more of a 1.2.

So the stat could be more indicative of the two teams' performances. For example:

Team A takes 24 shots with an average weight of 1.0. (Say 12 higher quality chances at 1.1 and 12 lower quality at 0.9)

Team B takes 30 shots with an average weight of 0.6 (10*1.0, 10* 0.2, 10* 0.6)

So while team B had more shots (and a higher Corsi as currently measured), their adjusted Corsi events based on shot quality would be 18. The point being to represent that team A actually outperformed team B in that regard.

Just throwing every puck on net is becoming (or could become) a misleading way to inflate some advanced stats as currently measured. We've been doing a lot of that lately, which is one way of attempting to break out of a slump, but when a slump lasts for this long, there has to be some consideration that those shot numbers are inflated. I think some people's perception of how we have been performing has been skewed by the fact we have been outshooting opponents, without taking that into consideration.

---

Another thing that could be considered is a rolling score adjusted Corsi based on performance over "x" number of games. A team with a losing record or that has struggled to produce over a stretch of games, may be inclined to shoot more. Similar to score effects, but applied over the course of multiple games.

It's entirely possible there is nothing to be gleaned there, but I thought it was worth mentioning.

Link to comment
Share on other sites

Another thing that could be considered is a rolling score adjusted Corsi based on performance over "x" number of games. A team with a losing record or that has struggled to produce over a stretch of games, may be inclined to shoot more. Similar to score effects, but applied over the course of multiple games.

I don't currently have game-by-game win/loss/points records, but when I can get that I'll definitely look into it.

Link to comment
Share on other sites

The nhlscrapr package for R from War-on-ice is extremely powerful and convenient. R in general is very friendly for doing statistics work. However, R is pretty slow, and as a general purpose language it's not great. Since the gathering of RTSS play-by-play records requires no statistical functions (except finding means, which isn't exactly difficult to do), I've decided to write my own parser in Python. It's not quite ready for public release yet, but in my testing it's been fairly robust and very fast. Retrieving and processing an RTSS record with nhlscrapr takes me about 12-17 seconds, whereas my alpha-quality pyrtss program is doing it in less than half a second. I was going to name it pysani (Python Scraper of Analytical NHL Information) or pyjarvi (Python JSON Assembler of RTSS Variable Information), but apparently that's too cheesy. :P

Link to comment
Share on other sites

Okay, semi-philosophical question on event timing and deriving ice time. Yes, this again. :P

When do we care about a player being on the ice? Obviously, all the time. But this isn't currently possible to determine with publicly available data. A shift where nothing happens (as in, no shot attempts, hits, stoppages, or anything) is completely missed by the NHL RTSS play by play. We won't even know it happened, and such phantom shifts could actually be important. I imagine we'd like to know if a player has a statistically significant amount of these more than another when doing comparisons, and we have no idea at the moment. This also gets into discussion about shift changes; RTSS records lack them completely. War-on-ice and nhlscrapr infer them by detecting when the players on ice for either the home or away teams change between non-stoppage events. To assign a time to these inferred changes, they take a mean of the time of the events surrounding them (I think). I haven't implemented this yet, but it's something I'm considering.

There's going to be error introduced there, obviously. It gets slightly more concerning when you consider that we know the scorekeepers in different arenas do things differently, sometimes radically. There are well done papers which detail methods to find the variance in RTSS events between arenas. I feel like I should be taking these rink variances into account if I'm going to use inferred changes. Alternatively, perhaps I should find the mean delta between events in every arena and see if there's a significant difference. That'll be influenced by the tendency of the teams playing to get/give up events as well game situations and the arena crew, but you'd think 9,225 regular season games between 2007-08 and 2014-15 (approximately 311 games per arena) would be enough to deal with that. The root mean square error here could be interesting, but I'm not sure how seriously to take it. This is an awful lot of consternation over what's likely around 1% of a player's time on ice, and we risk overfitting.

RTSS does output textual shift charts which include shift start and end times: http://www.nhl.com/scores/htmlreports/20152016/TV020688.HTM. It's worth noting these numbers do not match up at all with event-derived time on ice. I could parse these and use them to identify shift changes, but that's going to be complicated. It's also adding another layer of arena bias and human error.

Another option is not inferring anything. Just use raw on-ice event length. The difference between time on ice derived from event records with inferred changes and those without isn't small, however. It's sometimes in the order of minutes, in my testing. 5-10% or more. As well, by missing the phantom time we're probably artificially making players and teams look better or worse than they probably actually are based on the number of events in their games. By how much is something else to think about.

But this goes back to my initial question: when do we care about a player being on the ice? We're already dealing with data which are surrogates and analogues for what we're actually interested in. There's so much abstraction here already, so shouldn't we want time on ice which is limited by the scope of what we know we have, not what we hope we can extrapolate? We can't empirically and automatically determine when a player changes immediately after making a bad pass that dies on a patch of slushy ice near the hashmarks and gets immediately skated back into his zone for a goal against. He isn't on the ice for the conceded goal (or zone entry, for that matter), and in most rinks it probably isn't recorded as a giveaway. So worrying about adding the extra ~2-3 seconds to the record of that shift in which absolutely nothing happened according to the RTSS data seems somewhat off to me.

Link to comment
Share on other sites

^

I get what you're saying and why you'd want that level of detail/accuracy, but from what you describe, I'm guessing it won't be possible without player tracking, and even then there are situations where you would encounter players on the bench being responsible or contributing to on ice events.

You might be able to derive something by comparing events which happen within a limited time after a player goes on or off the ice, but the validity and whether the player being off the ice for that event is positive or negative, would always be debatable. It makes the assumption that the player had the ability to impact the preceding/following course of events, when the majority of the time, they probably did not.

If we give up a goal immediately after Desharnais goes to the bench, is that because he left his team in a negative situation, or is it because he has a positive effect while on the ice, so the team immediately suffers without him?

I would say that each step you take which leans away from the best available "official" recordings of data, takes you a step further from accuracy. You'll likely end up being less accurate by trying to be more accurate. ;)

Link to comment
Share on other sites

I would say that each step you take which leans away from the best available "official" recordings of data, takes you a step further from accuracy. You'll likely end up being less accurate by trying to be more accurate. ;)

That's my thought as well. I think I'm going to eschew all inferences and just provide raw RTSS timing.

Link to comment
Share on other sites

  • 2 weeks later...
  • 2 weeks later...

It appears that the NHL has intentionally disabled play-by-play records while games are still in progress. So, no more live Corsi charts, and no more player on ice data until the NHL decides to unlock the game record. And if last night's games are any indication, that might be multiple hours after it ends.

Link to comment
Share on other sites

Guest habs1952

It appears that the NHL has intentionally disabled play-by-play records while games are still in progress. So, no more live Corsi charts, and no more player on ice data until the NHL decides to unlock the game record. And if last night's games are any indication, that might be multiple hours after it ends.

LOL...do we really want to know?

Link to comment
Share on other sites

Guest habs1952

Excerpt from TSN for those interested:

We’ll start with the Canadiens. Here, we will look at every combination of one forward and one defenceman that we have seen play in excess of 60 minutes on the season. We’ll use shot-metric analysis (Corsi% Together) to protect us from the small sample limitations that goal-based analysis will provide. And, we’ll compare every combination of the forward/defenceman combination to the team average - combinations that have worked on the year will glow red, combinations that have failed on the year will glow blue. And, players will be sorted by total ice-time on the season.

How does Montreal look?

yost.gif

It’s impossible to not notice that David Desharnais is blue across the board. No matter which defender he’s playing with, his shot-numbers will pale (in most cases, miserably) in comparison to the team average. For a guy who is playing a ton of minutes for this Montreal team, it’s a massive concern. You can’t have a top-six centre getting his teeth kicked in like this and expect to win a Stanley Cup.

Now, Desharnais’ raw numbers aren’t terrible - he’s still breaking even relative to the league average with most of these teammates. So, perhaps this is more about calibrating ice time and usage for head coach Michel Therrien. Would a downgraded role for Desharnais help both him and the team? It strikes me as a very real possibility.

The other big takeaway can be found in Tomas Plekanec’s numbers. For a guy who has been really billed as the team’s shutdown/defensive centre for a few years now, he’s really having an impressive rebound season. Anytime you can compare to the likes of Max Pacioretty and Brendan Gallagher, you’re doing something right.

Link to comment
Share on other sites

Now, Desharnais’ raw numbers aren’t terrible - he’s still breaking even relative to the league average with most of these teammates. So, perhaps this is more about calibrating ice time and usage for head coach Michel Therrien. Would a downgraded role for Desharnais help both him and the team? It strikes me as a very real possibility.

I think the majority would agree he is better suited on the 3rd line not the 1 st line

That's where he was more successful earlier in the season

Trying explain that to the coach .

Apparently stats don't measure 'jam "

And didn't the Habs hire a stats guru , where is he , and what are they doing with the data he compiles

Link to comment
Share on other sites

And didn't the Habs hire a stats guru , where is he , and what are they doing with the data he compiles

They have a consultant, a guy named Matt Pfeffer. http://www.habseyesontheprize.com/latest-news/2015/10/10/9492733/matt-pfeffer-elaborates-on-his-role-with-the-canadiens-habs-analytics-therrien-data-analyst

Link to comment
Share on other sites

  • 7 months later...

That's it, yeah. Matplotlib is made for impatient statisticians who just want to get the stupid plot into their journal article and not for bored nerds trying to make pretty pictures for the internet, however. So I'm going back to Python Imaging LIbrary. Which is all low-level and brutal. Pray for mojo, etc.

Link to comment
Share on other sites

  • 1 month later...
Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...