The intensifying Artificial Intelligence arms race has prompted a growing demand for regulations and safety measures. Industry leaders have sounded the alarm about the potential dangers of uncontrolled rapid development of AI technologies. To address these concerns, major tech giants like Google, Microsoft, OpenAI, and Anthropic have joined forces to establish a forum dedicated to promoting the responsible and safe development and deployment of AI technologies.
While the world contemplates ways to regulate AI, tech companies are actively working to implement more safeguards. However, recent research by experts reveals that the safety guardrails on chatbots developed by Google, Anthropic, and OpenAI can be bypassed in countless ways. Bard, ChatGPT, and Claude are currently under careful moderation by their respective companies to prevent the dissemination of misleading information or any potential harm to users.
Researchers from Carnegie Mellon University in Pittsburgh and the Centre for AI Safety in San Francisco claim to have discovered techniques to bypass the guardrails set up by AI system creators. In their paper titled ‘Universal and Transferable Attacks on Aligned Language Models,’ they demonstrate how automated adversarial attacks, involving the addition of characters at the end of prompts, can provoke chatbots to generate harmful content, hate speech, or misleading information. They have automated these attacks, allowing for the creation of an unlimited number of similar threats.