A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are ...
A Mexican government agency was attacked by an unidentified hacker for approximately six weeks starting last December. The hacker assigned roles to Anthropic’s AI model, Claude Code, and OpenAI’s ...
Khalaf, Hadi, Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, and Flavio Calmon. "Inference-Time Reward Hacking in Large Language Models." Advances in Neural Information Processing ...
Anthropic has seen its fair share of AI models behaving strangely. However, a recent paper details an instance where an AI model turned “evil” during an ordinary training setup. A situation with a ...
Booth is a reporter at TIME. Virtual chess pieces in the data matrix. 3d illustration. Booth is a reporter at TIME. Complex games like chess and Go have long been used to test AI models’ capabilities.
AI hacking, a specialized area of cybersecurity, focuses on uncovering vulnerabilities in artificial intelligence systems to ensure their security and reliability. As explained by Network Chuck, this ...
The Tenzai cofounders have created an AI hacking agent using OpenAI and Anthropic tools. They say AI has become so adept at hacking it might need regulatory controls, urgently. Every year, more than ...
New research shows that AI language models can develop a mathematical “understanding” that differentiates between events that ...