PaddleOCR-VL
Vision-language OCR with handwriting support, table-to-HTML, and prompt-based extraction
PaddleOCR-VL is a vision-language OCR model with prompting capabilities. Handles handwriting, old documents, and complex layouts. Converts tables and charts to HTML and extracts embedded images directly.
When to use:
- Documents with handwriting mixed with printed text
- Tables and structured forms that need HTML-format output
- Prompt-guided extraction of specific fields from documents
Input: Image or document file (PNG, JPG, PDF, TIFF) + optional fine-tuned checkpoint Output: Extracted text, formatted output, and metadata
Model Settings
Output Format (default: markdown, required, options: markdown / json / html) Format of the OCR output.
- markdown: Clean text with structure — best for NLP pipelines
- json: Structured key-value output — best for programmatic field extraction
- html: Full layout-preserving HTML — best for visual rendering
Detect Handwriting (default: true) Enable specialized handwriting recognition.
- Enable for any document that may contain handwritten text
- Disable for purely printed documents to improve speed
Convert Tables to HTML (default: true) Convert detected tables into HTML table elements.
- Enable when table structure needs to be preserved
- Disable for plain text extraction where table structure is not needed