Researchers at anthropic just found a New way to trick AI into telling you Something that's not supposed to let's Talk about it AI companies have worked To keep AI chat Bots like chat GPT or Google's Gemini from sharing dangerous Information with varying degrees of Success however the large language Models or llms that actually power those Bots are always learning and evolving And what that means is they're moving The security goalposts and making risk Mitigation an Ever moving Target Recently researchers at AI company Anthropic which we've covered here on The tech runch minute have found a new Way around current AI guardrails an Approach to tricking AI chat Bots that They are calling mini shot jailbreaking Now this new vulnerability is a result Of the larger context Windows of the Latest generation of llms and that Basically means that these new models Can manage more data inside their Short-term memory it's gone from a Couple of sentences to entire books now Model with large context Windows like The ones powering chat gbt tend to Perform better if there are lots of Examples of a task within the prompt Itself so by asking the bot many Questions you're actually improving the Answers as you go this is called in Context learning but one unexpected

Effect of it is that models also get Better at replying to inappropriate Questions with repeated asks this makes It much more likely that if you Persistently ask an AI an inappropriate Question it will eventually tell you Something like how to build a bomb which Is the test question anthropic Researchers used so why does this method Work well as with many things in the llm AI world today we don't actually know Now what we do know is something within The latest llms allows them to kind of Home in on what a user wants so if a User demands trivia it will get better At activating latent data that has Something to do with trivia and if a User wants to know something Inappropriate well it kind of works the Same way so what are companies doing to Mitigate this new flaw well like many Things with AI it's a work in progress Anthropic researchers outlined this Vulnerability in a paper to warn both Their own industry and the public at Large one sure way to get around the Issue is just to limit the size of the Context window but the negative effects That that brings to a model's Performance well it makes it kind of Unpalatable so researchers are working On classifying and contextualizing Queries before they go into the model Now as for how effective that will be

Well my colleague Devon coldway Put It Best in his article of course that just Makes it so you have a new model to fool But at this stage goalpost moving in AI Security is just to be expected now you Can read more about this and all things AI over on teen.com and I'll see you Tomorrow

★★★★★

OUR TAKE

Coinbase is a popular cryptocurrency exchange. It makes it easy to buy, sell, and exchange cryptocurrencies like Bitcoin. Coinbase also has a brokerage service that makes it easy to buy Bitcoin as easily as buying stocks through an online broker. However, Coinbase can be expensive due to the fees it charges and its poor customer service.

OPEN ACCOUNT

Anthropic researchers find a way to jailbreak AI | TechCrunch Minute

Leave a Comment Cancel reply