Abstract: Recent advances in Speech Large Language Models (Speech LLMs) have paved the way for unified architectures across diverse speech understanding tasks. However, prevailing alignment paradigms ...
Abstract: This paper presents an ensemble framework for predicting semantic audio-text alignment for GC-12: x-to-audio alignment (XACLE) in the ICASSP 2026: SP Grand Challenge. We leverage ensemble ...