I find a problem with basically all studies here (including my own) in that they require a lot of work and time. There are usually many hours of boring research to have to be done, in order to learn and know about many factors leading up to the end results. There also seem to basically always be factors "biasing" things, including (of course) "randomness" or "circumstances".
I agree and that is why I believe that the topic of, metrics and methodology of, and estimated time involved in such studies should be considered and chosen carefully. It is important that the effect being measured, and the metric used in doing so, are not likely to overwhelmed by random error and/or factors which can't be removed, quantified, and/or easily assessed.
If an author wishes feedback on a potential or ongoing study, he/she can always start a thread (and even post a link to such in this thread, while requesting feedback in the thread for the study). The thread might contain the specific results to date and could be used to receive feedback about and discuss the various aspects of the study (topic, metrics, methodologies, etc.), especially those factors which the author believes are complicating the study.
(To use a common example, people often try to determine who produced more impressively between peak Gretzky and peak Mario. We can adjust based on mathematical methods, ending up with adjusted stats. But then we also want to know how much teammates affected their stats, or playing system. And in the end, we just end up with more or less arbitrary "feelings" of who did best.)
So, I think there needs to be done "boring" research in order to progress. To sort of lay foundations or reference to build upon. There are so many more or less automatic assumptions being done, and I think those too needs to be closely examined.
Data analysis is obviously limited by the types and quantity of data available. The less data is able to quantify a variable, the less one is able to analyze such a variable. At least it provides a more objective starting point from which non-quantifiable variables can be considered. If the starting point is wrong, the conclusion is much more likely to be wrong.
IMO this thread is best if not used to discuss individual studies (completed, ongoing or potential). However, since you reference a specific, common type of comparison for which math and data analysis are used, then let me briefly continue for illustrative purposes only.
Let's say one is comparing Gretzky and Lemieux with "simple adjusted" data (for league games, GPG and assist/goal ratio), but believes there are many other factors not being assessed. This is how I might approach such a comparison. First, let me say that even if we stop with "simple adjusted" data for the two, we very likely have a better starting point than if we used raw data. One might be tempted to stop there and use mental estimates for other factors in the interest of saving time/effort. However, the constraint is often more one of time/effort than in limits of the data. Eventually, one reaches the point of diminishing returns, where either the time/effort vs. info obtained is too much, and/or the the info provided by the data vs. the influence of non-quantifiable factors and random error is too little.
In such a comparison, there are often ways one could use other data and/or build upon the studies of others to help filter out other factors and refine the comparison.
- League quality: A study of league quality and/or the difficulty could assist in this case (it seems we, and others, have studied such things). Specifically, Lemieux probably played more of his prime years in a league of higher quality (although due to his injuries, the differences aren't as drastic), while if using "simple adjusted" data, Gretzky played more during a time when such data is biased against him to a some degree (probably due in large part to factors such as scoring being more balanced between lines).
- Competition: The possibility and likely causes of differing competition should be assessed and taken into account somehow if possible. Specifically, Gretzky's final years and much of (what should have been) Lemieux's late prime were impacted by a large group of forwards from outside Canada, which differs from Gretzky's prime years when such impact was relatively minor. The simplest, yet admittedly imperfect way I have found to look at this impact in isolation is the thought experiment "what if there were no (or player X, the one being studied, was the only) non-Canadian (or non-North American) player(s) in the NHL? How would this have affected player X's rankings in various categories. Again, the point is to have a much fairer and better starting point, not be perfect, since the other choice is much less fair and therefore less useful in considering the impact of this other factor. Without looking at the data, Lemieux should be impacted more by his additional competiton, but since both players often were leaders in various categories anyways, I'm not sure if there's much/any difference. This may have been different if Lemieux was playing more from '98-01.
- Teammates/Linemates: One can look at how each player performed with various linemates and/or teammates, and how he performed without each/all of them. One can also look at how some/all of those players performed without the player being studied. Specifically, I haven't really looked at this effect in depth.
- Team Playing Style: One can look at team performance in such categories as ESGA as a general, imperfect indicator of whether a team was more open or restrictive in playing style. Of course this metric depends on the quality of the defense/goalie, etc., so it's far from perfect, but at least it may give us some important info. Specifically, Gretzky tended to play on much better overall teams than Lemieux, and I think his teams tended to have lower ESGA (but not certain without looking at data).
- Overall Impact: One can look at adjusted plus-minus to see how much better each player's team was with and without him on the ice at even strength. Specifically, Gretzky's adj. +/- is better than Lemieux's, but this becomes complicated by the fact that they had unusual comparisons (Jagr, one of the leaders in this metric, and Messier, who often performed poorly in this metric in large part due to having Gretzky as part of his "off ice"). Also, although of limited value, the win% of the team with and without the player can be examined (if calculated properly). Specifically, Lemieux's teams performed much, much worse without him than with him. This is complicated by the fact that he missed the majority of some seasons (which makes the "with" component much less reliable). Gretzky didn't miss enough games during his prime to have even a decent sample of games from which to assess his overall impact.
- Playoff/International: The importance of this is often overemphasized in proportion to regular season performance. I'm not making a judgement, necessarily, but simply saying that by many/most people it's given much more importance in proportion to number of games played. There are some factors that often seem to be mostly or completely neglected when using this metric. First, while most at least know about adjusted numbers, they often cite actual playoff data which is unadjusted and therefore difficult to compare across different periods. Second, if career numbers are used, the proportion of playoff games played during a player's peak/prime vs. career can vary dramatically. Third, while there are smaller differences in strength of schedule during the season, the differences in opponents are much larger. A player on a very strong team will still generally face teams which are worse than his, but on avg. the playoff opponents will be better than the team's regular season opponents. However, a player on a weaker (in playoff terms) or mediocre team (in reg. season terms) will almost always be facing superior opposition and so his performance can generally be expected to be significantly worse than a player on a significantly better team. For instance, Dionne and Kariya are often criticized for their playoff performances, yet they are likely the underdog in most cases, so their playoff performance would be generally be expected to be worse. They also generally can less afford to rest during the regular season, lest their team miss the playoffs. Specifically, without looking at the data more in-depth at present, but using my own adjusted playoff data, I think Gretzky performed slightly better on a prime/career basis, but given the generally better teams he played for, it's difficult to distinguish between them.
- Trophies & Voting: This is another factor often cited by people when evaluating players. What most don't acknowledge is that it is simply quantifying the opinions of alleged "experts." Just how much importance should be given to the opinions of others, given that their choices are often difficult to explain and their credentials may vary substantially? The source data is simply the opinions of a select group (most often sportswriters). Quantifying this does not change this fact. While some interesting work has been done in this field (such as HockeyOutsider's Hart & Norris shares, which place emphasis on how often and/or how close to a player was at/near the top, rather than simply Trophy counting), the source data is completely subjective and this needs to be remembered.
In summary, while there are usually limits on the availability of and information provided by various data, we often assume that data cannot be used to evaluate various, seemingly non-quantifiable factors, rather than find a way to properly use what data is available to shed further light on the factor being considered. The goal IMO isn't to create some grand unified theory of hockey, but to attempt to provide a more objective starting point for further subjective discussion. It's for the individual to decide whether the importance of quantifying various categories of performance and/or contextual factors is worth the time and effort involved, but we must be careful in declaring such factors as completely unquantifiable and otherwise resorting to completely subjective means of analyzing performance and providing context.