I wanted to extract some crime statistics broken by the type of crime and different populations, all of course normalized by the population size. I got a nice set of tables summarizing the data for each year that I requested.
When I shared these summaries I was told this is entirely unreliable due to hallucinations. So my question to you is how common of a problem this is?
I compared results from Chat GPT-4, Copilot and Grok and the results are the same (Gemini says the data is unavailable, btw :)
So is are LLMs reliable for research like that?
No. Of course not. They’re not reliable for anything. They don’t have any kind of database of facts and don’t know or attempt to know anything at all.
They’re just a more advanced version of your phone’s predictive text. All they do is try to figure out which words most likely go in what order as a response to the prompt. That’s it. There is no logic of any kind dictating what an LLM outputs.