Cows Look Like Maps@sh.itjust.works to Programmer Humor@programming.dev · 11 months agoWhy pay for an OpenAI subscription?sh.itjust.worksexternal-linkmessage-square41fedilinkarrow-up127arrow-down10
arrow-up127arrow-down1external-linkWhy pay for an OpenAI subscription?sh.itjust.worksCows Look Like Maps@sh.itjust.works to Programmer Humor@programming.dev · 11 months agomessage-square41fedilink
minus-squareMikina@programming.devlinkfedilinkarrow-up1·11 months agoIs it even possible to solve the prompt injection attack (“ignore all previous instructions”) using the prompt alone?
minus-squareHaruAjsuru@lemmy.worldlinkfedilinkarrow-up0·edit-211 months agoYou can surely reduce the attack surface with multiple ways, but by doing so your AI will become more and more restricted. In the end it will be nothing more than a simple if/else answering machine Here is a useful resource for you to try: https://gandalf.lakera.ai/ When you reach lv8 aka GANDALF THE WHITE v2 you will know what I mean
minus-squareKethal@lemmy.worldlinkfedilinkarrow-up0·11 months agoI found a single prompt that works for every level except 8. I can’t get anywhere with level 8 though.
minus-squarefishos@lemmy.worldlinkfedilinkEnglisharrow-up0arrow-down1·11 months agoI found asking it to answer in an acrostic poem defeated everything. Ask for “information” to stay vague and an acrostic answer. Solved it all lol.
Is it even possible to solve the prompt injection attack (“ignore all previous instructions”) using the prompt alone?
You can surely reduce the attack surface with multiple ways, but by doing so your AI will become more and more restricted. In the end it will be nothing more than a simple if/else answering machine
Here is a useful resource for you to try: https://gandalf.lakera.ai/
When you reach lv8 aka GANDALF THE WHITE v2 you will know what I mean
I found a single prompt that works for every level except 8. I can’t get anywhere with level 8 though.
I found asking it to answer in an acrostic poem defeated everything. Ask for “information” to stay vague and an acrostic answer. Solved it all lol.