Hugging Face Connector

Connect to Hugging Face Hub datasets. Access thousands of datasets for AI and machine learning projects.

Setup Instructions

1. Navigate to Data Integrations

Go to the Data Integrations tab in your flow.

2. Select Hugging Face Integration

Click Select an Integration, type Hugging Face in the search, and click Connect.

3. Create Hugging Face Access Token (Optional)

For public datasets, no token is required. For private datasets or gated content:

Go to Hugging Face Settings
Click New token
Give your token a name (e.g., "AIcuFlow Connector")
Select token type:
- Read: For downloading datasets only (recommended)
- Write: If you plan to upload data back
Click Generate token
Important: Copy the token immediately - you won't be able to see it again

4. Configure the Connector

Back in the connector setup, fill in:

Connector Name: Give your connector a descriptive name (e.g., "HuggingFace Datasets")
Access Token (Optional): Paste your token (only required for private/gated datasets)
Folder (Optional): Select a destination folder in the file manager
- If not specified, data will be stored in the root directory

5. Specify Dataset

Configure which dataset to download:

Dataset Name: The identifier of the dataset on Hugging Face Hub
- Format: organization/dataset-name or just dataset-name
- Examples:
  - imdb - IMDb movie reviews
  - squad - Stanford Question Answering Dataset
  - wikitext - Wikipedia text corpus
  - glue - General Language Understanding Evaluation
Dataset Subset (Optional): Some datasets have multiple subsets or configurations
- Example: For glue dataset, you can specify mrpc, sst2, etc.
Split (Optional): Specify which split to download
- Common splits: train, test, validation
- Leave empty to download all splits

You can find dataset information at: https://huggingface.co/datasets/{dataset-name}

You can add the dataset id in the repository id section.

6. Configure Download Options

Download Format: Choose how to save the data
- Parquet: Efficient columnar format (recommended)
- CSV: Comma-separated values
- JSON: JavaScript Object Notation
- Arrow: Apache Arrow format
Cache: Enable caching to avoid re-downloading unchanged data

7. Create the Connection

After filling in all details, click Create Connection.

The system will:

Authenticate with Hugging Face Hub (if token provided)
Download the specified dataset
Convert to your chosen format
Begin the initial data synchronization

8. Monitor Sync Status

Navigate to Data Synchronization to see the import progress
Large datasets may take time to download and process
Progress will be shown for each split being downloaded

9. Access Your Data

Once the sync is complete, go to File Manager
Navigate to the folder you specified (or root directory)
You'll see the dataset files organized by splits (train, test, validation)
Click on any file to preview the data
The data is now ready to use in your AI pipelines and flows

What Gets Imported:

Dataset files in your chosen format
Metadata and dataset information
All specified splits (train, test, validation)
Dataset card and documentation (if available)

Best Practices:

Check dataset licenses before using in production
Use specific dataset versions/commits for reproducibility
Start with small datasets to test your pipeline
Cache datasets to avoid repeated downloads
Keep access tokens secure and never share them

Popular Hugging Face Datasets:

imdb - Movie reviews sentiment analysis
squad - Question answering dataset
glue - Language understanding benchmark
conll2003 - Named entity recognition
wmt14 - Machine translation dataset
cifar10 - Image classification dataset
common_voice - Multilingual speech dataset

Hugging Face Connector

Setup Instructions

1. Navigate to Data Integrations

2. Select Hugging Face Integration

3. Create Hugging Face Access Token (Optional)

4. Configure the Connector

5. Specify Dataset

6. Configure Download Options

7. Create the Connection

8. Monitor Sync Status

9. Access Your Data

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Hugging Face Connector

Setup Instructions

1. Navigate to Data Integrations

2. Select Hugging Face Integration

3. Create Hugging Face Access Token (Optional)

4. Configure the Connector

5. Specify Dataset

6. Configure Download Options

7. Create the Connection

8. Monitor Sync Status

9. Access Your Data

On this page

Command Palette