Best LLM/NLP for finding hapax legomena?

Daemon Silverstein@thelemmy.club · 1 month ago

Best LLM/NLP for finding hapax legomena?

jacksilver@lemmy.world · 1 month ago

Not the original commenter, but to add some more context. The words usually removed in traditional NLP applications are called “stop words” and are usually more “non-valuable” words like “the, and, but”.

However, LLMs don’t skip stop words, they actually need them to better understand the context of the sentence. That being said, LLMs are not great for statistical analysis and a simple word count would be more consistent and faster.