📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s AMÁLIA, a €5.5 million European Portuguese LLM, is now operational and outperforms many benchmarks. However, key structural questions remain about its openness, native data sufficiency, and optimization goals, raising concerns about the broader European sovereign-LLM movement.
Portugal’s €5.5 million AMÁLIA large language model is now operational, with the base version released in late 2025, marking a significant step in the country’s AI development efforts. However, despite technical progress, fundamental questions about the model’s openness, native-language data, and strategic objectives remain unanswered, highlighting broader issues in the European sovereign-LLM movement.
AMÁLIA was developed by a consortium of about 60 researchers across Portugal’s leading institutions, including NOVA and IST, and is based on a continuation of the EuroLLM multilingual foundation. It was announced in December 2024 and became publicly accessible in October 2025, primarily to academic users. The model handles text only, with multimodal capabilities planned for future updates.
Technically, AMÁLIA was trained on a mix of 107 billion tokens, with approximately 5.8 billion tokens from Portuguese web archives, constituting about 5.5% of the pre-training data. It outperforms previous open models on European Portuguese benchmarks and surpasses Qwen 3-8B on most Portuguese-specific tasks, although it still trails on some benchmarks like ALBA.
Despite these achievements, questions persist about how open the model truly is, the sufficiency of native-language data, and the strategic goals guiding its development, issues that are central to the broader European effort to develop sovereign-language AI models.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Advanced Language Tool Kit: Teaching the Structure of the English Language
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
AI model training data annotation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
Portuguese language AI chatbot
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
multimodal AI development hardware
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European AI Sovereignty
The development of AMÁLIA exemplifies Europe’s broader push for sovereign-language AI models, aiming to reduce dependence on US and Chinese models. However, unresolved questions about openness, native data adequacy, and strategic priorities could hinder the effectiveness and transparency of these efforts. The Portuguese case underscores the need for clear standards and goals in national AI initiatives, especially as similar projects emerge across Europe.
European Sovereign-Language Model Efforts and Challenges
Across Europe, multiple countries are investing in national LLMs, including Italy’s Minerva, Germany’s Aleph Alpha, and France’s Mistral. These projects face common structural questions: How open is ‘fully open’—and what does that mean? How much native-language data is enough? What should be the primary goal—performance, transparency, or sovereignty? Portugal’s AMÁLIA is the latest example, illustrating both progress and persistent uncertainties in this landscape.
“AMÁLIA is an impressive piece of work, but the hard questions about its openness and strategic purpose remain unanswered.”
— Duarte O.Carmo
Unresolved Core Questions About AMÁLIA
It remains unclear how open AMÁLIA truly is, given the lack of detailed transparency on its training data and licensing. The sufficiency of native Portuguese data for long-term performance and adaptability is also uncertain, as is the primary strategic goal—whether to maximize performance, ensure transparency, or promote sovereignty. These uncertainties could influence future development and policy decisions.
Next Steps for AMÁLIA and European LLMs
The final version of AMÁLIA is scheduled for release in June 2026, which will likely include more detailed evaluations and possibly increased transparency. Over the next 12-24 months, the project team may address some of the current gaps, but the broader European sovereign-LLM landscape will also evolve, with ongoing debates about openness, native data, and strategic priorities shaping future policies and investments.
Key Questions
What is the main purpose of AMÁLIA?
AMÁLIA aims to develop a high-performance European Portuguese LLM to reduce dependence on non-European models and promote AI sovereignty within Portugal and Europe.
How open is AMÁLIA really?
It is not yet clear how transparent or accessible the model’s training data and licensing are, raising questions about its openness and replicability.
Why are the three questions important?
They address core issues of transparency, native-language data sufficiency, and strategic goals, which are vital for the success and credibility of European sovereign-LLMs.
Will AMÁLIA be improved before its final release?
It is likely that the development team will address some of the current gaps, but the extent and nature of these improvements remain uncertain until the June 2026 release.
What does this mean for Europe’s AI future?
The outcome of AMÁLIA and similar projects will influence Europe’s ability to develop independent, transparent, and effective AI models, shaping policy and technological sovereignty.
Source: ThorstenMeyerAI.com