According to Decrypt, Paris-based startup Mistral AI has released Mixtral, an open large language model (LLM) that reportedly outperforms OpenAI's GPT 3.5 in several benchmarks while being more efficient. The company recently claimed a $2 billion valuation and received substantial Series A investment from venture capital firm Andreessen Horowitz (a16z), with participation from tech giants Nvidia and Salesforce.
Mixtral uses a technique called sparse mixture of experts (MoE), which Mistral says makes the model more powerful and efficient than its predecessor, Mistral 7b, and even its more powerful competitors. MoE is a machine learning technique in which developers train multiple virtual expert models to solve complex problems. Each expert model is trained on a specific topic or field, and when prompted with a problem, the model picks a group of experts from a pool of agents to decide which output suits their expertise better.
Mistral AI claims that Mixtral has 46.7 billion total parameters but only uses 12.9 billion parameters per token, allowing it to process input and generate output at the same speed and cost as a 12.9 billion model. The company also states that Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference and matches or outperforms GPT 3.5 on most standard benchmarks.
Mixtral is licensed under the permissive Apache 2.0 license, allowing developers to freely inspect, run, modify, and build custom solutions on top of the model. However, there is debate about whether Mixtral is 100% open source, as Mistral has only released "open weights" and the core model's license prevents its use to compete against Mistral AI. The startup has not provided the training dataset and the code used to create the model, which would be the case in an open-source project.
Mistral AI says Mixtral has been fine-tuned to work exceptionally well in foreign languages besides English, mastering French, German, Spanish, Italian, and English. An instructed version called Mixtral 8x7B Instruct was also released for careful instruction following, achieving a top score of 8.3 on the MT-Bench benchmark, making it the current best open source model on the benchmark. Mixtral is available to download via Hugging Face, and users can also use the instruct version online.