Architecture: Mistral Large 2 uses a Transformer decoder architecture. It employs a “dense” neural network in which every part of the network is connected.
Parameters: It has 123 billion parameters, enabling it to handle complex language tasks with high accuracy. This size lets the model handle complex language tasks with great nuance. Mistral AI designed the model size so that it can operate at scale on a single node.
Context window: It has a context window of 128,000 tokens, which helps maintain coherence and relevance across long conversations or documents.
Multilingual support: Mistral Large 2 supports many languages, including Russian, Chinese, Japanese, Korean, Spanish, and Italian.
Programming languages: It excels in more than 80 programming languages, such as Python, Java, C, C++, and JavaScript.
Performance: Mistral Large 2 shows strong performance in various benchmarks and competes with models like OpenAI’s GPT-4o and Meta’s Llama 3 405B. It does well on Wild Bench, where it placed second behind GPT-4o. On Arena Hard it placed third, behind GPT-4o and Claude 3.5 Sonnet.
Function calling: Mistral Large 2 outperforms larger models, such as GPT-4o and Claude 3.5 Sonnet, at function calling.
Efficiency: Mistral Large 2 sets a new standard for the performance/price ratio, delivering great performance at an affordable price.
Reduced hallucinations: Mistral AI has focused on minimizing inaccuracies by adding stricter accuracy checks and feedback systems to ensure the model provides reliable information. Mistral claims that Large 2 produces more concise responses than leading AI models.
Licensing: Mistral Large 2 is available under the Mistral Research License for open-source use and modifications for research and non-commercial purposes. A Mistral Commercial License is required for commercial use.