Model Bench Update - Search News

Morning Overview on MSN

Baidu’s CEO says AI has shifted from model competition to agent competition — agents will learn, verify, and optimize on their own

Robin Li, the co-founder and CEO of Baidu, used his keynote at the company’s Create 2026 conference to make a blunt ...

1don MSN

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing ...

22d

OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0

So when it comes to models that the general public can access, GPT-5.5 has retaken the crown for OpenAI, achieving the state-of-the-art across 14 benchmarks.

10don MSN

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT

The new GPT-5.5 Instant model will replace GPT-3.5 Instant as the default model for ChatGPT ...

1mon

Anthropic releases preview of new AI model Mythos for cyber defence

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system ...

17d

American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding

By putting the weights of a highly capable, 33B-parameter agentic model in the hands of researchers and startups, Poolside is ...

1mon

Claude Opus 4.7 hits 92% honesty rate— are we closer than ever to human-like AI with less hallucination? Here’s what Anthropic’s new AI model is capable of

Claude Opus 4.7 benchmarks explained start with a strong data point: 87.6% on SWE-bench Verified. This jump signals real coding gains in 2026. Developers now see better issue resolution and faster ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results