Nanonets OCR2 3B
Advanced OCR with structured Markdown, semantic tagging, and document feature extraction
Nanonets-OCR2-3B is a 3B parameter vision-language OCR model. Produces structured Markdown or HTML output with semantic tagging. Handles signatures, watermarks, checkboxes, flowcharts, and handwriting. Multilingual support.
When to use:
- Extracting text from invoices, forms, and scanned documents
- Preserving document structure (headings, tables, lists) in output
- Documents containing signatures or watermarks that need to be identified
Input: Image or document file (PNG, JPG, PDF, TIFF) + optional fine-tuned checkpoint Output: Extracted text, formatted output (Markdown/HTML), and OCR metadata
Model Settings
Output Format (default: markdown, required, options: markdown / html) Format of the structured output.
- markdown: Best for downstream text processing and LLM pipelines
- html: Best for rendering in a browser or preserving visual layout
Extract Signatures (default: true) Detect and tag signatures in the document.
- Enable when processing contracts or signed forms
- Disable if signatures are not relevant to save processing time
Extract Watermarks (default: true) Detect and tag watermarks.
- Enable for compliance or verification use cases
- Disable for clean text extraction where watermarks are noise