Chat with Your S3 Documents in Minutes Using aicuflow RAG
By the end of this, you'll know:
- →What is RAG and why it matters
- →Connecting your S3 bucket to aicuflow
- →Running a sync job to pull your files
- →Building a RAG index in one click
- →Asking questions and reading relevance scores
- →Sharing your knowledge base with teammates
#Chat with Your S3 Documents in Minutes Using aicuflow RAG
Your data is sitting in S3. Your team keeps asking you questions that are buried somewhere inside those files. Sound familiar?
RAG — Retrieval-Augmented Generation — is how you fix that. Instead of an AI that guesses from training data, RAG lets a language model pull exact answers from your documents, cite the source, and tell you how confident it is.
The catch has always been setup. Chunking, embedding, vector stores, indexing pipelines — getting RAG production-ready traditionally takes days of engineering work.
With aicuflow, it takes minutes. This tutorial walks you through the entire process, from empty flow to answering complex questions about a 535 GB document archive.
Retrieval-Augmented Generation (RAG) combines a search index with a language model. When you ask a question, the system:
- Searches your indexed documents for the most relevant chunks
- Feeds those chunks as context to the language model
- Returns an answer grounded in your actual data — with source citations
The result is an AI that knows your documents as well as you do, without hallucinating facts or losing context.
#Step 1: Create a Flow
Start by creating a new flow in aicuflow. The name doesn't matter — call it anything. This is the workspace where your S3 connection, file manager, and RAG index will all live together.
#Step 2: Connect Your S3 Bucket
With your flow open, go to Data Integration and select the S3 connection tool. You'll fill in four things:
- Connection name — a label for your reference (e.g. "My Bucket")
- AWS Access Key and Secret — your standard AWS credentials
- AWS Region — critical: this must match the region your bucket is in
- Bucket name — the exact name of your S3 bucket
Optional: specify a subfolder. If you leave the folder path empty, aicuflow will pull everything in the bucket. If you only want a specific directory (e.g. dogs/training/), enter it here — all files and subfolders inside will be included. You can also define file patterns to filter by extension or naming convention.
Once configured, hit Download Files and set the file limit. Leave it empty to pull everything — aicuflow handles large volumes just fine.
#Step 3: Run the Sync Job
aicuflow immediately starts a sync job — a background process that fetches your files from S3 and brings them into the platform's file manager.
For a 535 GB archive, this completes in a matter of minutes. You can navigate to your root folder in the file manager and watch the files populate in real time. Once the sync job shows Successful, all your data is ready to index.
#Step 4: Build the RAG Index
Click the AI Search icon in the sidebar. If you haven't created a RAG index yet, you'll see a prompt to create one — click it.
aicuflow will immediately start building the index over all the files in your file manager. The time this takes depends on your document volume:
- A handful of files: under a minute
- Hundreds of large files: 10–30 minutes
You don't need to wait around. Close the browser and come back later — indexing runs in the background. When you return, you'll see every file marked as indexed.
From the RAG view, you can also browse a topic graph showing the concepts and connections aicuflow discovered across your documents — a useful way to explore a new dataset before you start asking questions.
#Step 5: Start Asking Questions
This is where it pays off. Switch to RAG Chat and ask anything about your documents.
Example query:
"What are the 3 nodes required to train a model in aicuflow?"
The response comes back in seconds with:
- A precise answer pulled directly from your documents
- Source citations — the exact files the answer was retrieved from
- A relevance score for each source — so you know how confident the retrieval was
The relevance score is particularly useful. A high score means the retrieved chunk was a strong match for your question. A lower score signals the answer might be reconstructed from context rather than a direct quote — worth cross-checking.
Another example:
"What are the two types of output from a training run?"
Answer: model weights and training metrics — with the exact documentation page cited.
#Step 6: Share Your Knowledge Base
Your RAG index isn't just for you. From the flow, you can share the chat with teammates or colleagues. Anyone you share it with can query the same knowledge base directly — no S3 credentials required, no setup on their end.
Think of it as a shared, AI-powered knowledge base that anyone on your team can talk to.
#Step 7: Expose It as an API
Once your RAG is working, you can create an API endpoint for it — enabling you to integrate your knowledge base into other applications, internal tools, or workflows. A separate tutorial covers the different ways to use this API in practice.
#Why This Approach is Fast
Most RAG setups require you to manage infrastructure separately: a vector database, an embedding service, a chunking pipeline, and an orchestration layer on top. Every piece adds setup time, maintenance overhead, and potential failure points.
aicuflow collapses all of that into a single workflow:
| What you'd normally build | What aicuflow does |
|---|---|
| S3 ingestion pipeline | Built-in S3 connector + sync jobs |
| File storage + management | Integrated file manager |
| Chunking + embedding + vector index | One-click RAG index builder |
| Retrieval + LLM integration | RAG chat, out of the box |
| API layer | API creation from within the flow |
From S3 credentials to your first answered question: under 10 minutes for most setups.
#Wrapping Up
RAG is one of the highest-value things you can add to any AI system — but only if your retrieval layer is actually connected to your real data. aicuflow removes the friction between your S3 bucket and a production-ready knowledge base.
The steps in order:
- Create a flow
- Add an S3 connection and run a sync job
- Build a RAG index from your files
- Start asking questions — with sources and relevance scores
- Share the knowledge base with your team or expose it as an API
If you have documents sitting in S3 that your team constantly has to manually search through, this setup will change how you work with them.
#References
[1] Lewis, Patrick et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
[2] aicuflow RAG Documentation. https://aicuflow.com/docs/tool/rag
[3] aicuflow Data Integration Documentation. https://aicuflow.com/docs/tool/data-integration