LayoutLMv3
Document understanding model combining text, layout, and image for forms and invoices
LayoutLMv3 jointly processes text tokens, their spatial positions (bounding boxes), and the document image — making it highly effective for invoice parsing, form understanding, and contract extraction.
When to use:
- Extracting key-value pairs from invoices, receipts, or forms
- Document question answering ("What is the total amount on this invoice?")
- Named entity recognition in scanned business documents
- Document classification (contract type, form category)
Input: Document image (PNG) + optional question + optional fine-tuned checkpoint Output: Answer or extracted entities with bounding boxes, confidence scores, and optionally an annotated document image
Model Settings
Task (default: document_qa, options: document_qa / classification / token_classification / key_value_extraction) The document understanding task to perform.
- document_qa: Answer a question about the document — requires a question input
- classification: Classify the document into a category
- token_classification: Label tokens (e.g., named entity recognition)
- key_value_extraction: Extract all key-value pairs from the document
Inference Settings
Use OCR (default: true) Run OCR on the document image to extract text before processing.
- true: Use for scanned documents or images where text is not embedded
- false: Disable for digital PDFs with embedded text to skip OCR