PaddleOCR-VL

Vision-language OCR with handwriting support, table-to-HTML, and prompt-based extraction

PaddleOCR-VL is a vision-language OCR model with prompting capabilities. Handles handwriting, old documents, and complex layouts. Converts tables and charts to HTML and extracts embedded images directly.

When to use:

Documents with handwriting mixed with printed text
Tables and structured forms that need HTML-format output
Prompt-guided extraction of specific fields from documents

Input: Image or document file (PNG, JPG, PDF, TIFF) + optional fine-tuned checkpoint Output: Extracted text, formatted output, and metadata

Model Settings

Output Format (default: markdown, required, options: markdown / json / html) Format of the OCR output.

markdown: Clean text with structure - best for NLP pipelines
json: Structured key-value output - best for programmatic field extraction
html: Full layout-preserving HTML - best for visual rendering

Detect Handwriting (default: true) Enable specialized handwriting recognition.

Enable for any document that may contain handwritten text
Disable for purely printed documents to improve speed

Convert Tables to HTML (default: true) Convert detected tables into HTML table elements.

Enable when table structure needs to be preserved
Disable for plain text extraction where table structure is not needed

PaddleOCR-VL

Model Settings

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

PaddleOCR-VL

Model Settings

On this page

Command Palette