RAG Chatbot for Website
Learn how to build a RAG chatbot for a website with Julep
Overview
This tutorial demonstrates how to:
- Set up a web crawler using Julep’s Spider integration
- Process and store crawled content in a document store
- Implement RAG for enhanced AI responses
- Create an intelligent agent that can answer questions about crawled content
Task Structure
Let’s break down the task into its core components:
1. Input Schema
First, we define what inputs our task expects:
This schema specifies that our task expects:
- A URL string (e.g., “https://en.wikipedia.org/wiki/Artificial_intelligence”)
- Number of sentences to club together to reduce the size of the chunks list (e.g., 5)
2. Tools Configuration
Next, we define the external tools our task will use:
We’re using two tools:
- The
get_page
api call for web crawling - The
create_agent_doc
system tool for storing processed content
3. Main Workflow Steps
Crawl Website
This step:
- Takes the input URL and crawls the website
- Processes content into readable markdown format
- Chunks content into manageable segments
- Filters out unnecessary elements like images and SVGs
Process and Index Content
This step:
- Processes each content chunk in parallel
- Generates contextual metadata for improved retrieval
- Prepares content for storage
Store Documents
This step:
- Stores processed content in the document store
- Adds metadata for source tracking
- Creates searchable documents for RAG
Usage
Start by creating an execution for the task. This execution will make the agent crawl the website and store the content in the document store.
Next, create a session for the agent. This session will be used to chat with the agent.
Finally, chat with the agent.
Example Output
This is an example output when the agent is asked “What is Julep?”
Next Steps
Try this task yourself, check out the full example, see the RAG Chatbot cookbook.