According to PANews, the Qwen team has announced the open-source release of the Qwen2.5-VL-32B-Instruct model, featuring 32 billion parameters. This model demonstrates exceptional performance in tasks such as image understanding, mathematical reasoning, and text generation. Enhanced through reinforcement learning, the model's responses align more closely with human preferences, surpassing the previously released 72B model in multimodal evaluations like MMMU and MathVista.
The 32B model introduces several improvements over the earlier Qwen2.5-VL series. It offers responses that better match human subjective preferences by adjusting output style for more detailed, well-formatted, and human-aligned answers. Additionally, the model's mathematical reasoning capabilities have significantly improved, enhancing accuracy in solving complex mathematical problems. In terms of image understanding and reasoning, the model exhibits stronger accuracy and fine-grained analysis in tasks involving image parsing, content recognition, and visual logic deduction.