Alibaba has announced the launch of its Qwen3.5-Omni multimodal model. According to PANews, the Qwen3.5-Omni series includes Plus, Flash, and Light versions, all of which support a 256k long context. The model can handle over 10 hours of audio input and more than 400 seconds of 720P (1FPS) audio-visual input. It has undergone extensive multimodal pre-training on vast amounts of text, visual data, and over 100 million hours of audio-visual data, showcasing exceptional multimodal perception and generation capabilities. Compared to its predecessor, Qwen3-Omni, the Qwen3.5-Omni model has significantly enhanced multilingual capabilities, supporting speech recognition in 113 languages and dialects and speech generation in 36 languages and dialects.