Large Language Models Benchmarks

How an AI system learned to write expert-level scientific code

ERA is an AI system that uses large language models and tree search to automatically write, test, and refine scientific ...

European Medical Journal

Advanced AI Language Model Outperforms Physicians in Reasoning Tasks

Large language model outperformed physicians in diagnostic reasoning tasks, highlighting potential for AI in clinical care.

Voice of Alexandria

Google DeepMind Features Hirundo’s Security-Hardened Gemma 4 Model – Outperforms LLMs 170x Its Size on Security

Gemma-4-E4B-IT (Unlearned) is the smallest and most secure Large Language Model across the sampled models. The hardened model ...

10d

What is DeepSeek? Everything a marketer needs to know

WebFX reports that DeepSeek, an AI LLM, enhances marketing tasks, proving effective in content creation, customer support, ...

17d

How Sakana trained a 7B model to orchestrate GPT, Claude and Gemini LLMs

Claude Sonnet 4, and Gemini 2.5 Pro dynamically — no hardcoded pipelines, fewer tokens than competing frameworks.

Computer Weekly

Large language models provide unreliable answers about public services, Open Data Institute finds

Popular large language models (LLMs) are unable to provide reliable information about key public services such as health, taxes and benefits, the Open Data Institute (ODI) has found. Drawing on more ...

Forbes

Can Quantum-Inspired AI Compete With Today’s Large Language Models?

As large language models (LLMs) continue their rapid evolution and domination of the generative AI landscape, a quieter evolution is unfolding at the edge of two emerging domains: quantum computing ...

Quanta Magazine

To Make Language Models Work Better, Researchers Sidestep Language

Language isn’t always necessary. While it certainly helps in getting across certain ideas, some neuroscientists have argued that many forms of human thought and reasoning don’t require the medium of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results