Dokumentation (english)

HTTP Connector

Download data from any publicly accessible URL. Connect to raw files, APIs, and web resources directly.

Download data from any publicly accessible URL. Connect to raw files, APIs, and web resources directly.

Setup Instructions

1. Navigate to Data Integrations

Go to the Data Integrations tab in your flow.

2. Select HTTP Integration

Click Select an Integration, type HTTP in the search, and click Connect.

3. Configure the Connector

Back in the connector setup, fill in:

  • Connector Name: Give your connector a descriptive name (e.g., "Titanic Dataset")
  • URL: The full URL to the file or resource you want to download
  • Headers (Optional): Add custom HTTP headers if needed for authentication or other purposes
  • Folder (Optional): Select a destination folder in the file manager
    • If not specified, data will be stored in the root directory

4. Find Your Data URL

There are several ways to get data URLs:

From GitHub (Raw Files):

  1. Navigate to a file in a public GitHub repository
  2. Click the Raw button in the upper right corner
  3. Copy the URL from your browser

Example: https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv

From Other Sources:

  • Direct download links from data repositories
  • Public API endpoints that return data
  • Cloud storage public URLs (S3, Google Cloud Storage, etc.)
  • Any publicly accessible file URL

5. Optional: Add Headers

If the URL requires authentication or special headers:

Common Header Examples:

  • Authorization: Bearer your-token-here (for API authentication)
  • Accept: application/json (specify response format)
  • User-Agent: YourApp/1.0 (identify your application)

Headers should be added in key-value format:

Authorization: Bearer abc123xyz
Accept: application/json

6. Create the Connection

After filling in all details, click Create Connection.

HTTP Connector Configuration

The system will:

  • Fetch the data from the specified URL
  • Download the file to your file manager
  • Begin the initial data synchronization

7. Monitor Sync Status

  1. Navigate to Data Synchronization to see the import progress
  2. The connector will download the file from the URL
  3. Once complete, the file will be available for use

8. Access Your Data

  1. Once the sync is complete, go to File Manager
  2. Navigate to the folder you specified (or root directory)
  3. You'll see the downloaded file
  4. Click on the file to preview the data
  5. The data is now ready to use in your AI pipelines and flows

Use Cases

Common HTTP Connector Use Cases:

  1. GitHub Raw Files

    • Download datasets from GitHub repositories
    • Access CSV, JSON, or other data files
    • Example: https://raw.githubusercontent.com/username/repo/main/data.csv
  2. Public APIs

    • Connect to REST APIs that return data
    • Download JSON or XML responses
    • Use with API endpoints that provide data dumps
  3. Cloud Storage

    • Access publicly shared files from S3, GCS, or Azure Blob Storage
    • Download files with public URLs
    • Sync data from shared cloud folders
  4. Data Repositories

    • Download from UCI Machine Learning Repository
    • Access datasets from government open data portals
    • Connect to research data repositories
  5. Regular File Updates

    • Set up sync jobs to periodically fetch updated files
    • Monitor data sources that update regularly
    • Keep your datasets current with external sources

Best Practices

  • Test URLs First: Make sure the URL is publicly accessible before creating the connector
  • Use HTTPS: Prefer HTTPS URLs for secure data transfer
  • Check File Sizes: Large files may take time to download
  • Set Up Sync Jobs: For data that updates regularly, create a sync job to keep it current
  • Handle Authentication: If the URL requires authentication, use the Headers field
  • Verify Data Format: Ensure the downloaded file format is supported (CSV, JSON, Parquet, etc.)

Example URLs

Here are some example public datasets you can try:

  • Titanic Dataset: https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv
  • Iris Dataset: https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv
  • Tips Dataset: https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv

Simply copy any of these URLs into the HTTP connector to start downloading data!


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items