Dokumentation (english)

LayoutLMv3

Document understanding model combining text, layout, and image for forms and invoices

LayoutLMv3 jointly processes text tokens, their spatial positions (bounding boxes), and the document image — making it highly effective for invoice parsing, form understanding, and contract extraction.

When to use:

  • Extracting key-value pairs from invoices, receipts, or forms
  • Document question answering ("What is the total amount on this invoice?")
  • Named entity recognition in scanned business documents
  • Document classification (contract type, form category)

Input: Document image (PNG) + optional question + optional fine-tuned checkpoint Output: Answer or extracted entities with bounding boxes, confidence scores, and optionally an annotated document image

Model Settings

Task (default: document_qa, options: document_qa / classification / token_classification / key_value_extraction) The document understanding task to perform.

  • document_qa: Answer a question about the document — requires a question input
  • classification: Classify the document into a category
  • token_classification: Label tokens (e.g., named entity recognition)
  • key_value_extraction: Extract all key-value pairs from the document

Inference Settings

Use OCR (default: true) Run OCR on the document image to extract text before processing.

  • true: Use for scanned documents or images where text is not embedded
  • false: Disable for digital PDFs with embedded text to skip OCR

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 2 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items