TL;DR

IBM has introduced two new multilingual embedding models, granite-embedding-97m-multilingual-r2 and 311m-multilingual-r2, under Apache 2.0. They offer improved retrieval quality across 200+ languages, including code support, and are designed for enterprise deployment.

IBM has released two new open-source multilingual embedding models, granite-embedding-97m-multilingual-r2 and granite-embedding-311m-multilingual-r2, under the Apache 2.0 license, aiming to improve multilingual retrieval and code understanding for enterprise applications.

The models are built on ModernBERT architecture, supporting over 200 languages, with 52 languages receiving explicit training for higher-quality retrieval. The 97M-parameter model scores 60.3 on the Multilingual MTEB Retrieval benchmark, outperforming previous models of similar size. The full-size 311M model scores 65.2, ranking second among open models under 500M parameters.

Both models handle context lengths up to 32,768 tokens, support code retrieval across nine programming languages, and are compatible with popular frameworks such as sentence-transformers, LangChain, and Haystack. They are optimized for CPU inference via ONNX and OpenVINO and require no task-specific tuning.

Why It Matters

This release addresses a key challenge in multilingual NLP—balancing model size with language coverage and retrieval quality. By offering high-performance, open-source models that support over 200 languages, IBM enables broader access and deployment in multilingual and cross-lingual applications, including search, retrieval-augmented generation, and code understanding. The enterprise-ready design emphasizes responsible data handling and deployment suitability.

Mastering Natural Language Processing with Python: Build Chatbots, Text Analysis Tools, and More with NLP Techniques

Mastering Natural Language Processing with Python: Build Chatbots, Text Analysis Tools, and More with NLP Techniques

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Prior models like XLM-RoBERTa provided multilingual support but had limitations in context length and retrieval accuracy. The R2 models are a ground-up rebuild using ModernBERT, which incorporates recent advances in transformer architecture, resulting in improved efficiency and performance. The release builds on IBM’s previous efforts to create scalable, responsible NLP tools for enterprise use, with a focus on broad language support and technical robustness.

“The Granite Embedding Multilingual R2 models significantly narrow the gap between size and performance in multilingual embeddings, supporting over 200 languages with enterprise-level quality.”

— IBM Research

“Our models are designed to be plug-and-play, requiring no task-specific tuning, and are compatible with existing frameworks to facilitate easy integration.”

— IBM Data Science Team

MASTERING LLMS AND RAG: Build Smarter AI Apps with Real-Time Knowledge (Code Accelerator: Learn Faster, Build Better)

MASTERING LLMS AND RAG: Build Smarter AI Apps with Real-Time Knowledge (Code Accelerator: Learn Faster, Build Better)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how these models will perform in large-scale, real-world enterprise deployments, or how they will compare with proprietary models in specific use cases. Further testing and user feedback are expected to clarify their practical effectiveness and limitations.

Generative AI and Large Language Models

Generative AI and Large Language Models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

IBM plans to continue evaluating these models in diverse applications and may release updates or additional tools to enhance their deployment. Monitoring user feedback and benchmarking in real-world scenarios will be key next steps.

The Unicode Framework: Building Multilingual Software (programming book)

The Unicode Framework: Building Multilingual Software (programming book)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What are the main differences between the 97M and 311M models?

The 97M model is more compact, with 384-dimensional embeddings, and scores highly on retrieval benchmarks for its size. The 311M model offers higher overall performance, supports longer contexts, and is suitable for more demanding applications.

Can these models be used for code retrieval?

Yes, both models support cross-lingual code retrieval across nine programming languages, making them suitable for technical and developer-focused applications.

Are these models ready for enterprise deployment?

Yes, they are designed to be enterprise-ready, with optimized inference options, broad language support, and compliance with responsible data handling practices.

What frameworks are compatible with these models?

They work out of the box with sentence-transformers, transformers, LangChain, LlamaIndex, Haystack, and Milvus, requiring only a one-line model name change for integration.

You May Also Like

Fair-value appraisals for used GPUs and AI hardware

New manual valuation method for used GPUs and AI hardware aims to bring transparency to secondary markets, helping brokers price equipment accurately.

The Twelve Real Complaints About AI Tools in 2026 — A Reddit, Twitter, and GitHub Synthesis

A detailed report on the twelve most common user complaints about AI tools in 2026, sourced from Reddit, Twitter, GitHub, and official channels.

God Damn AI is making me dumb

A developer expresses concern that reliance on AI is diminishing their writing and coding abilities, raising broader questions about AI’s effect on human skills.

Why the Best AI Workers May Be the Best Editors, Not the Best Prompters

For those seeking excellence, understanding why top AI workers excel as editors rather than prompts can transform your approach and elevate your projects.