* * *
Yes, but there's a difference between logical inference and pattern-based inference. When they say "inference" they mean Bayesian inference. When humans say "inference" we tend to mean logical inference.
Large Language Models can do pattern-based inference at truly mind-boggling scale because they have been built using massively parallelized processes going through unsupervised "fill in the missing word" cloze-style exercises trillions of times. But they fail at actual logical inference in shockingly basic cases. Because again: the LLM's entire goal is to predict *the next plausible word* given its many, many, *many* inputs. Plausibility, not accuracy. And since ChatGPT's remit is to respond, it will always respond. It won't say "gee, that's tricky, let me think about it." It will come up with the most plausible string of words and hand them over, even if they are "hallucinations" (which is an actual term used to describe wildly nonsensical output, and which your Blue Velvet example would probably be an example of.)
(If you ever played with the GPT-3 stuff that predated ChatGPT, you would have seen actual settings to allow you to play with confidence values. Interesting that ChatGPT doesn't expose those settings.)
Note also that ChatGPT never notices its errors until the user calls attention to them; that calling out constitutes a new input that ChatGPT must then evaluate, and I suspect that if the user says "you are wrong about fact X" then ChatGPT must treat that input with 100% confidence.
It's also notable how strongly the quality of output you get from a ChatGPT session depends on which data you choose to pull from it first. I suspect that if you ask it a series of factual questions, and if you accept those responses as valid in subsequent conversation, its confidence in those new inputs goes way, way up and so they are weighted accordingly in follow-up questions in ways that they would not otherwise be weighted.
The consistency of certain responses about ChatGPT's limitations also implies, to me, a separate model/process that basically involves acknowledging its own error bars and explaining its internal mechanisms to users who are continually surprised by the rudimentary mistakes that it makes. You see the same answers over and over again: "I apologize for any confusion" is something that ChatGPT probably says a million times a day at this point.
Anyway. Truly fascinating shit.