←back to Blog

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

«`html

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support for Most European Languages

Understanding the Target Audience

The target audience for TildeOpen LLM includes AI researchers, business leaders in technology, language service providers, and governmental organizations within the EU. Their pain points revolve around the lack of effective language processing tools for under-represented European languages, compliance with data protection regulations, and the need for scalable AI solutions. Their goals include achieving linguistic equity, enhancing digital sovereignty, and improving the accuracy of AI applications in multilingual contexts. They prefer clear, technical communication that emphasizes practical applications and compliance with regulatory standards.

Overview of TildeOpen LLM

Latvian language-tech firm Tilde has released TildeOpen LLM, an open-source foundational large language model (LLM) purpose-built for European languages, with a sharp focus on under-represented and smaller national and regional languages. This release marks a strategic leap toward linguistic equity and digital sovereignty within the EU.

Under the Hood: Architecture, Training, and Governance

The public release occurred on September 3, 2025, when Tilde deployed the model free to users via Hugging Face. Built as a 30-billion-parameter dense decoder-only transformer, the model is available under a permissive license (CC-BY-4.0) and includes broad language support—from Latvian and Lithuanian to Ukrainian, Turkish, and beyond.

Training occurred on the EU’s supercomputers: LUMI (Finland) and JUPITER, utilizing 2 million GPU hours awarded via the European Commission’s Large AI Grand Challenge. The model was trained using EleutherAI–inspired GPT-NeoX scripts across 450K updates, consuming approximately 2 trillion tokens. The training process included three-stage sampling: uniform across languages, natural distribution to boost high-data-volume languages, and a final uniform sweep for balance.

Key technical specifications include:

  • 60 layers
  • Embedding size 6144
  • 48 attention heads
  • 8192-token context window
  • SwiGLU activations
  • RoPE positional encoding
  • RMSNorm layer norms

Language Equity and Data Sovereignty

Mainstream models often prioritize English and other major languages, leading to skewed performance for Baltic, Slavic, and other smaller European languages. This under-representation results in poor grammar, awkward phrasing, and hallucinations in generated text.

TildeOpen addresses these issues by embedding an “equitable tokenizer,” designed to represent text similarly across languages, reducing token count and increasing inference efficiency for lesser-represented languages. Organizations can self-host the model in local data centers or secure EU-compliant clouds, ensuring adherence to GDPR and other data-protection mandates. This capability addresses sovereignty concerns associated with US- or Asia-hosted models.

Strategic Horizon: From Prototype to European AI Infrastructure

TildeOpen serves as a foundational “base” model, with expectations for future versions to include more specialized applications, such as instruction-tuned translation models built atop this core. This initiative positions Latvia, via Tilde, as a tech exporter, aiming to scale European AI infrastructure while preserving linguistic diversity.

In the realm of research, this move reflects broader investigations into multilingual model behavior, highlighting existing gaps. Evaluations indicate that even robust open LLMs can struggle with lexical accuracy for Baltic languages, reinforcing the necessity for localized development.

Summary

TildeOpen LLM reframes EU AI—not merely as regulatory compliance, but as technical stewardship. It is a grounded, high-capacity model with transparent architecture, scalable deployment, and a strong commitment to linguistic equity. It prioritizes substance over hype.

FAQs

Q1: What is TildeOpen LLM?

TildeOpen is a 30B-parameter multilingual large language model trained on EU supercomputers, optimized for European languages, especially under-represented ones.

Q2: How is it different from mainstream LLMs?

Unlike global models that prioritize English, TildeOpen employs an equitable tokenizer and balanced training to ensure fair representation and accuracy across smaller European languages.

Q3: Can organizations self-host the model?

Yes. TildeOpen is open-source under CC-BY-4.0 and can be deployed in local data centers or EU-compliant clouds to meet GDPR and data sovereignty requirements.

Q4: What are the main use cases?

Use cases include government services, translation, education, AI assistants, speech technologies, and multilingual customer support—any domain requiring accurate European language processing.

Check out the model on Hugging Face and explore technical details. Feel free to check out our GitHub Page for tutorials, codes, and notebooks. Also, follow us on Twitter and join our 100k+ ML SubReddit, and subscribe to our newsletter.

«`