The mistake that people make when they think about ChatGPT is that they assume it's an inference engine. It's not; it's a language modeling engine. Factual accuracy is secondary to plausible predictability of the next word, according to the Public Natural Language Processing (NLP) Datasets used as inputs.
So what are those datasets? Well, here's a popular set:
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP) - niderhoff/nlp-datasets
github.com
In that list, we see two likely candidates for hockey knowledge/discussions: Wikipedia, and ALL REDDIT COMMENTS LOL. So when it's trying to answer a general question like "what kind of trade do the Hurricanes need" it pulls the closest related text from its datasets and assembles a response that is *plausible first* and *factual second*. ChatGPT can bloviate very effectively to look like it knows all about hockey while not actually knowing shit, because that's what hockey chat forums are full of!
Note that when you ask a *specific and simple factual question* it can often find the correct factual response, because the most plausibly predictive answer, in those cases, is almost always the actual fact.
Me: who does sebastian aho play for?
ChatGPT: Sebastian Aho currently plays for the Carolina Hurricanes of the National Hockey League (NHL).
Note also that when one primes ChatGPT with the correct facts, it is more likely to make the proper inferences using those facts in subsequent sessions:
me: Who won the Stanley Cup in 2006?
ChatGPT: The Carolina Hurricanes won the Stanley Cup in 2006.
(Subsequent session)
me: Have the Carolina Hurricanes ever won a Stanley Cup?
ChatGPT: Yes, the Carolina Hurricanes have won the Stanley Cup once, in 2005-2006 NHL season.
The more complex/vague the initial question is, the less likely the model is to give you the correct answers, because that requires inference, which is a skill that ChatGPT does not currently optimize for.
f***ing with ChatGPT is fun, but if you want to understand it, read the literature. It really is fascinating. Here's a good place to start:
(Note that there's a parallel model called InstructGPT that optimizes for more "truthful" results. That's discussed in the above paper.)