Overview
This tutorial demonstrates how to:- Set up a web crawler using Julepβs Spider integration
- Process and store crawled content in a document store
- Implement RAG for enhanced AI responses
- Create an intelligent agent that can answer questions about crawled content
Task Structure
Letβs break down the task into its core components:1. Input Schema
First, we define what inputs our task expects:- A URL string (e.g., βhttps://en.wikipedia.org/wiki/Artificial_intelligenceβ)
- Number of sentences to club together to reduce the size of the chunks list (e.g., 5)
2. Tools Configuration
Next, we define the external tools our task will use:- The
get_page
api call for web crawling - The
create_agent_doc
system tool for storing processed content
3. Main Workflow Steps
1
Crawl Website
Understanding the use of the _ variable
Understanding the use of the _ variable
The
_
variable refers to the current context object. When accessing properties like _['url']
, itβs retrieving values from the input parameters passed to the task.- Takes the input URL and crawls the website
- Processes content into readable markdown format
- Chunks content into manageable segments
- Filters out unnecessary elements like images and SVGs
2
Process and Index Content
- Processes each content chunk in parallel
- Generates contextual metadata for improved retrieval
- Prepares content for storage
3
Store Documents
- Stores processed content in the document store
- Adds metadata for source tracking
- Creates searchable documents for RAG
Complete Task YAML
Complete Task YAML
YAML
Usage
Start by creating an execution for the task. This execution will make the agent crawl the website and store the content in the document store.Example Output
This is an example output when the agent is asked βWhat is Julep?βOutput
Output
Julep is a comprehensive platform designed for creating production-ready AI systems and agents. Here are the key aspects of Julep:Core Features:
- Complete Infrastructure Layer
- Provides infrastructure between LLMs and software
- Built-in support for long-term memory
- Multi-step process management
- State management capabilities
- AI Agent Development
- Creates persistent AI agents that remember past interactions
- Supports complex task execution
- Enables multi-step workflows
- Includes built-in tools and integrations
- Production-Ready Features
- Automatic retries for failed steps
- Message resending capabilities
- Task recovery systems
- Real-time monitoring
- Error handling
- Automatic scaling and load balancing
- Development Approach
- Uses 8-Factor Agent methodology
- Treats prompts as code with proper versioning
- Provides clear tool interfaces
- Offers model independence to avoid vendor lock-in
- Includes structured reasoning capabilities
- Maintains ground truth examples for validation
- Documentation: https://docs.julep.ai/
- API Playground: https://dev.julep.ai/api/docs
- Python SDK: https://github.com/julep-ai/python-sdk/blob/main/README.md
- JavaScript SDK: https://github.com/julep-ai/node-sdk/blob/main/README.md
- Various use case examples and cookbooks
- User Profiling
- Email Assistant
- Trip Planner
- Document Management
- Website Crawler
- Multi-step Tasks
- Advanced Chat Interactions
- Discord Community: https://discord.gg/2EUJzJU2Yt
- Book a Demo: https://calendly.com/ishita-julep
- Dev Support: hey@julep.ai
Next Steps
- Try this task yourself, check out the full example, see the RAG Chatbot cookbook.
- To learn more about the integrations used in this task, check out the integrations page.