Google has launched Gemma 4 12B, a new open-source multimodal AI model that supports text, image, and native audio inputs ...
Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0 While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more ...
Audio artificial intelligence startup Gradium is launching today after closing on an impressive $70 million seed funding round, just three months after it was founded. The startup is backed by ...
On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...
Alibaba expands its AI live speech translation model from 18 to 60 languages, adding real-time voice cloning and reducing ...
U.S. tech giants are facing a reckoning from the East. Even as Nvidia pledged today to invest a staggering $100 billion into its own customer OpenAI's data centers — a move that raised eyebrows across ...
On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that integrates ...