Audio Visual Language Model

News9Live on MSN

Google’s new Gemma 4 12B AI model brings powerful multimodal intelligence to everyday laptops

Google has launched Gemma 4 12B, a new open-source multimodal AI model that supports text, image, and native audio inputs ...

VentureBeat

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop

Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0 While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more ...

SiliconANGLE

Audio language model startup Gradium raises $70M to create more realistic voice AI systems

Audio artificial intelligence startup Gradium is launching today after closing on an impressive $70 million seed funding round, just three months after it was founded. The startup is backed by ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

Slator

Alibaba Updates Speech Translation Model, Triples Language Coverage

Alibaba expands its AI live speech translation model from 18 to 60 languages, adding real-time voice cloning and reducing ...

VentureBeat

China's Alibaba challenges U.S. tech giants with open source Qwen3-Omni AI model accepting text, audio, image and video

U.S. tech giants are facing a reckoning from the East. Even as Nvidia pledged today to invest a staggering $100 billion into its own customer OpenAI's data centers — a move that raised eyebrows across ...

Ars Technica

Google’s PaLM-E is a generalist robot brain that takes commands

On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that integrates ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results