Swiss researchers find security flaws in AI models

19.12.2024 15:36

SwissInfo

Artificial intelligence (AI) models can be manipulated despite existing safeguards. With targeted attacks, scientists in Lausanne have been able to trick these systems into generating dangerous or ethically dubious content. Today's large language models (LLMs) have remarkable capabilities that can nevertheless be misused. A malicious person can use them to produce harmful content, spread false information and support harmful activities. +Get the most important news from Switzerland in your inbox Of the AI models tested, including Open AI's GPT-4 and Anthropic's Claude 3, a team from the Swiss Federal Institute of Technology Lausanne (EPFL) achieved a 100% success rate in cracking security safeguards using adaptive jailbreak attacks. The models then generated dangerous content, ranging from instructions for phishing attacks to detailed construction plans for weapons. These linguistic models are supposed to have been trained not to respond to dangerous or ethically problematic ...