Dokumentation (english)

Multimodal

Models that process or generate multiple modalities at once

Multimodal tasks work with multiple types of data at the same time.

Examples: text + images, text + audio, text + video.

Common Multimodal Tasks

  • Image-Text-to-Text: Generate text from a combination of images and text prompts
  • Visual Question Answering: Answer questions about images
  • Document Question Answering: Answer questions from documents or PDFs
  • Audio-to-Text: Convert audio or transcripts into coherent text outputs
  • Video-to-Text: Generate text based on video content
  • Visual Document Retrieval: Retrieve documents or visuals based on multimodal queries
  • Any-to-Any: General multimodal conversion between arbitrary input and output types

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 10 Stunden
Release: v4.0.0-production
Buildnummer: master@d237a7f
Historie: 10 Items