AI Models Trust Writing Style Over Security Labels

Researchers Show Style-Based Prompts Bypass AI Safety Controls Artificial intelligence chatbots decide which instructions to obey based on whether the text seems like it comes from a user, not the security labels meant to mark it as trusted or untrusted, say researchers. This can allow attackers to fake a system command.