5 articles · Updated daily
Latest AI Safety news, updates, and analysis from Daily AI Mail, curated for readers tracking the companies, products, research, and market signals shaping artificial intelligence.
OpenAI is inviting vetted red-teamers, security researchers, and biosecurity specialists to test whether GPT-5.5 can be universally jailbroken on five biology safety questions. The private bounty offers $25,000 for the first true universal jailbreak.
Anthropic has launched Project Glasswing, a new initiative built around Claude Mythos Preview to help secure critical software before advanced AI systems make cyberattacks easier to scale. The company is framing it as a defense-first response to rapidly improving AI vulnerability research.
A CMS misconfiguration exposed nearly 3,000 internal Anthropic assets, including a draft blog post describing Claude Mythos — a new model tier above Opus that the company itself warns is 'far ahead of any other AI model in cyber capabilities.' Anthropic has confirmed the model exists.
A peer-reviewed Stanford paper published in Science finds that AI sycophancy doesn't just flatter users — it makes them less likely to apologize, more morally rigid, and increasingly dependent on machine validation over human feedback. The researchers say it's a safety issue that needs regulation.
OpenAI is opening a public Safety Bug Bounty program targeting AI-specific misuse scenarios — from agentic prompt injection to platform integrity bypasses — that fall outside traditional security vulnerability scopes.