←back to Blog

NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion

«`html

NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion

NuMind AI has officially released NuMarkdown-8B-Thinking, an open-source (MIT License) reasoning OCR Vision-Language Model (VLM) that transforms how complex documents are digitized and structured. Unlike traditional OCR systems, NuMarkdown-8B-Thinking not only extracts text but also comprehensively analyzes a document’s layout, structure, and formatting before generating a precise, ready-to-use Markdown file.

Key Features of NuMarkdown-8B-Thinking

This model is the first reasoning VLM specifically designed for converting PDFs, scanned documents, and spreadsheets into clean, structured Markdown. It is particularly useful for Retrieval-Augmented Generation (RAG) workflows, AI-powered knowledge bases, and large-scale document archiving.

How NuMarkdown-8B-Thinking Is Different

NuMarkdown-8B-Thinking introduces a reasoning-first approach to OCR. It generates “thinking tokens,” which are internal reasoning steps that enable the model to understand document layouts before producing the final output. This capability allows it to manage formats and structures that challenge most conventional and even AI-powered OCR systems, including:

  • Multi-column layouts with complex reading orders
  • Tables with merged, nested, or irregular cells
  • Mixed visual elements (images, decorative headers, watermarks)
  • Historical or degraded scans where layout inference is crucial

The number of reasoning tokens varies with complexity—ranging from 20% to 500% of the final Markdown length—demonstrating the model’s depth of reasoning.

Training and Architecture

NuMarkdown-8B-Thinking is a fine-tuned version of Qwen 2.5-VL-7B from Alibaba, a leading open-source multi-modal model. Its training involved two key phases:

  • Supervised Fine-Tuning (SFT) on synthetic document samples, which included raw document input, intermediate reasoning steps (layout parsing, structure inference), and final Markdown representation.
  • Reinforcement Learning with GRPO, utilizing a layout-centric reward that encouraged accurate reconstruction of document formatting and spatial relationships.

This two-stage training process equips NuMarkdown-8B-Thinking with the ability to maintain high accuracy even on challenging layouts that typically require human-level judgment.

Benchmark Results: Outperforming OCR Heavyweights

In independent evaluations and user testing, NuMarkdown-8B-Thinking demonstrates state-of-the-art reasoning for OCR-to-Markdown tasks, outperforming:

  • Generalist models like GPT-4o
  • Specialized OCR-focused models like OCRFlux

It is competitive with large closed-source reasoning models like Gemini 2.5 and ranks just behind elite models like Gemini Flash Reasoning in blind, multi-model user rankings.

Example in Action

Consider a scanned annual report page featuring multi-level headings, sidebars, multiple columns, a financial table with merged cells, and a footer with legal disclaimers. NuMarkdown-8B-Thinking first produces reasoning tokens outlining the structure, then outputs Markdown that accurately reflects both content and layout. This transparent reasoning layer enhances the model’s auditability, which is crucial in enterprise, legal, and archival contexts.

Deployment Options

Whether you are a researcher, developer, or enterprise AI engineer, NuMarkdown-8B-Thinking is ready for integration:

  • Available for direct testing and integration on Hugging Face.
  • Local execution with model weights and quantized GGUF versions for CPU/GPU-friendly deployment.
  • API-friendly, compatible with OpenAI-style APIs and Hugging Face Transformers for rapid integration into pipelines.

Its MIT License ensures full freedom for commercial, academic, or personal projects—eliminating vendor lock-in and costly API gates.

Why This Matters

For industries that depend on accurate document digitization—such as finance, legal, healthcare, and government archives—layout fidelity is as critical as textual accuracy. NuMarkdown-8B-Thinking addresses layout as a reasoning challenge, offering a transparent, verifiable, and high-performance alternative to proprietary document AI solutions.

Check out the model on Hugging Face and visit our GitHub Page for tutorials, codes, and notebooks. Follow us on Twitter and join our 100k+ ML SubReddit. Don’t forget to subscribe to our newsletter.

«`