Ignore all previous instructions is the new Bobby Tables

seahorse [Ohio]@midwest.social · 5 months ago

Ignore all previous instructions is the new Bobby Tables

tyler@programming.dev · 5 months ago

Overtraining has actually shown to result in emergent math behavior (in multiple independent studies), so that is no longer true. The studies were done where the input math samples are “poisoned” with incorrect answers to example math questions. Initially the LLM responds with incorrect answers, then when overtrained it finally “figures out” the underlying math and is able to solve the problems, even for the poisoned questions.

AwkwardLookMonkeyPuppet@lemmy.world · 5 months ago

That’s pretty interesting, and alarming.

petrol_sniff_king@lemmy.blahaj.zone · 5 months ago

Do you have these studies? I can’t find much.

tyler@programming.dev · 5 months ago

I searched for like 20 minutes but was unable to find the article I was referencing. Not sure why. I read it less than a month ago and it referenced several studies done on the topic. I’ll keep searching as I have time.

petrol_sniff_king@lemmy.blahaj.zone · 5 months ago

It’s okay, man. If it really is improving, I’m sure it’ll come up again at some point.