Alibaba’s Qwen division released Qwen3.5-Omni, a new omnimodal AI model. The system processes text, images, audio, and video content simultaneously. It handles over 10 hours of audio input and supports a context window of 256,000 tokens. Enhanced multilingual capabilities improve speech recognition and generation.
Alibaba claims the Qwen3.5-Omni-Plus variant outperforms Google’s Gemini 3.1 Pro in audio benchmarks. These tests cover audio understanding, reasoning, recognition, and translation. The model matches Gemini 3.1 Pro in overall audio-visual understanding. Alibaba offers the model in three sizes via an API.