LayoutLMv3

Document understanding model combining text, layout, and image for forms and invoices

LayoutLMv3 jointly processes text tokens, their spatial positions (bounding boxes), and the document image - making it highly effective for invoice parsing, form understanding, and contract extraction.

When to use:

Extracting key-value pairs from invoices, receipts, or forms
Document question answering ("What is the total amount on this invoice?")
Named entity recognition in scanned business documents
Document classification (contract type, form category)

Input: Document image (PNG) + optional question + optional fine-tuned checkpoint Output: Answer or extracted entities with bounding boxes, confidence scores, and optionally an annotated document image

Model Settings

Task (default: document_qa, options: document_qa / classification / token_classification / key_value_extraction) The document understanding task to perform.

document_qa: Answer a question about the document - requires a question input
classification: Classify the document into a category
token_classification: Label tokens (e.g., named entity recognition)
key_value_extraction: Extract all key-value pairs from the document

Inference Settings

Use OCR (default: true) Run OCR on the document image to extract text before processing.

true: Use for scanned documents or images where text is not embedded
false: Disable for digital PDFs with embedded text to skip OCR

LayoutLMv3

Model Settings

Inference Settings

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

LayoutLMv3

Model Settings

Inference Settings

On this page

Command Palette