> ## Documentation Index
> Fetch the complete documentation index at: https://docs.julep.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG Chatbot for Website

> Learn how to build a RAG chatbot for a website with Julep

## Overview

This tutorial demonstrates how to:

* Set up a web crawler using Julep's Spider integration
* Process and store crawled content in a document store
* Implement RAG for enhanced AI responses
* Create an intelligent agent that can answer questions about crawled content

## Task Structure

Let's break down the task into its core components:

### 1. Input Schema

First, we define what inputs our task expects:

```yaml theme={"dark"}
input_schema:
  type: object
  properties:
    url:
      type: string
    reducing_strength:
      type: integer
```

This schema specifies that our task expects:

* A URL string (e.g., "[https://en.wikipedia.org/wiki/Artificial\_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)")
* Number of sentences to club together to reduce the size of the chunks list (e.g., 5)

### 2. Tools Configuration

Next, we define the external tools our task will use:

```yaml theme={"dark"}
- name: get_page
  type: api_call
  api_call:
    method: GET
    url: https://r.jina.ai/
    headers:
      accept: application/json
      x-return-format: markdown
      x-with-images-summary: "true"
      x-with-links-summary: "true"
      x-retain-images: "none"
      x-no-cache: "true"
      Authorization: "Bearer JINA_API_KEY"

- name: create_agent_doc
  type: system
  system:
    resource: agent
    subresource: doc
    operation: create
```

We're using two tools:

* The `get_page` api call for web crawling
* The `create_agent_doc` system tool for storing processed content

### 3. Main Workflow Steps

<Steps>
  <Step title="Crawl Website">
    ```yaml theme={"dark"}
    - tool: get_page
      arguments:
        url: $ "https://r.jina.ai/" + steps[0].input.url
      
    - evaluate:
        result: $ chunk_doc(_.json.data.content.strip())

    - workflow: index_page
      arguments:
        content: $ _.result
        document: $ steps[0].output.json.data.content.strip()
        reducing_strength: $ steps[0].input.reducing_strength
    ```

    <Accordion title="Understanding the use of the _ variable">
      The `_` variable refers to the current context object. When accessing properties like `_['url']`, it's retrieving values from the input parameters passed to the task.
    </Accordion>

    This step:

    * Takes the input URL and crawls the website
    * Processes content into readable markdown format
    * Chunks content into manageable segments
    * Filters out unnecessary elements like images and SVGs
  </Step>

  <Step title="Process and Index Content">
    ```yaml [expandable] theme={"dark"}
    - evaluate:
        document: $ _.document
        chunks: |
          $ [" ".join(_.content[i:i + max(_.reducing_strength, len(_.content) // 9)]) 
            for i in range(0, len(_.content), max(_.reducing_strength, len(_.content) // 9))]
      label: docs

    # Step 1: Create a new document and add it to the agent docs store
    - over: $ [(steps[0].input.document, chunk.strip()) for chunk in _.chunks]
      parallelism: 3
      map:
        prompt: 
        - role: user
          content: >-
            $ f'''
            <document>
            {_[0]}  
            </document>

            Here is the chunk we want to situate within the whole document
            <chunk>
            {_[1]}
            </chunk>

            Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. 
            Answer only with the succinct context and nothing else.
            '''
        unwrap: true
        settings:
          max_tokens: 16000

    - evaluate:
        final_chunks: |
          $ [
            NEWLINE.join([succint, chunk.strip()]) for chunk, succint in zip(steps['docs'].output.chunks, _)
          ]
    ```

    This step:

    * Processes each content chunk in parallel
    * Generates contextual metadata for improved retrieval
    * Prepares content for storage
  </Step>

  <Step title="Store Documents">
    ```yaml theme={"dark"}
    - over: $ _['final_chunks']
      parallelism: 3
      map:
        tool: create_agent_doc
        arguments:
          agent_id: $ agent.id
          data:
            metadata:
              source: jina_crawler
            title: Website Document
            content: $ _
    ```

    This step:

    * Stores processed content in the document store
    * Adds metadata for source tracking
    * Creates searchable documents for RAG
  </Step>
</Steps>

<Accordion title="Complete Task YAML" icon="code">
  ```yaml YAML [expandable] theme={"dark"}
  # yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/src/schemas/create_task_request.json
  name: Julep Jina Crawler Task
  description: A Julep agent that can crawl a website and store the content in the document store.

  ########################################################
  ################### INPUT SCHEMA #######################
  ########################################################

  input_schema:
    type: object
    properties:
      url:
        type: string
      reducing_strength:
        type: integer
        
  ########################################################
  ################### TOOLS ##############################
  ########################################################

  tools:
  - name: get_page
    type: api_call
    api_call:
      method: GET
      url: https://r.jina.ai/
      headers:
        accept: application/json
      x-return-format: markdown
      x-with-images-summary: "true"
      x-with-links-summary: "true"
      x-retain-images: "none"
      x-no-cache: "true"
      Authorization: "Bearer JINA_API_KEY"

  - name : create_agent_doc
    description: Create an agent doc
    type: system
    system:
      resource: agent
      subresource: doc
      operation: create

  ########################################################
  ################### INDEX PAGE SUBWORKFLOW ##############
  ########################################################

  index_page:

  # Step #0 - Evaluate the content
  - evaluate:
      document: $ _.document
      chunks: |
        $ [" ".join(_.content[i:i + max(_.reducing_strength, len(_.content) // 9)]) 
          for i in range(0, len(_.content), max(_.reducing_strength, len(_.content) // 9))]
      label: docs

  # Step #1 - Process each content chunk in parallel
  - over: "$ [(steps[0].input.content, chunk) for chunk in _['chunks']]"
    parallelism: 3
    map:
      prompt: 
      - role: user
        content: >-
          $ f'''
          <document>
          {_[0]}
          </document>

          Here is the chunk we want to situate within the whole document
          <chunk>
          {_[1]}
          </chunk>

          Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. 
          Answer only with the succinct context and nothing else.'''
      
      unwrap: true
      settings:
        max_tokens: 16000

  # Step #2 - Create a new document and add it to the agent docs store
  - evaluate:
      final_chunks: |
        $ [
          NEWLINE.join([chunk, succint]) for chunk, succint in zip(steps[1].input.chunks, _)
        ]

  # Step #3 - Create a new document and add it to the agent docs store
  - over: $ _['final_chunks']
    parallelism: 3
    map:
      tool: create_agent_doc
      arguments:
        agent_id: "$ str(agent.id)" # <--- This is the agent id of the agent you want to add the document to
        data:
          metadata:
            source: "jina_crawler"

          title: "Website Document"
          content: $ _

  ########################################################
  ################### MAIN WORKFLOW ######################
  ########################################################

  main:

  # Step 0: Get the content of the product page
  - tool: get_page
    arguments:
      url: $ "https://r.jina.ai/" + steps[0].input.url

  # Step 1: Chunk the content
  - evaluate:
      result: $ chunk_doc(_.json.data.content.strip())

  # Step 2: Evaluate step to document chunks
  - workflow: index_page
    arguments:
      content: $ _.result
      document: $ steps[0].output.json.data.content.strip()
      reducing_strength: $ steps[0].input.reducing_strength
  ```
</Accordion>

## Usage

Start by creating an execution for the task. This execution will make the agent crawl the website and store the content in the document store.

<CodeGroup>
  ```python Python [expandable] theme={"dark"}
  from julep import Client
  import time
  import yaml

  # Initialize the client
  client = Client(api_key=JULEP_API_KEY)

  # Create the agent
  agent = client.agents.create(
    name="Julep Jina Crawler Agent",
    about="A Julep agent that can crawl a website and store the content in the document store.",
  )

  # Load the task definition
  with open('crawling_task.yaml', 'r') as file:
    task_definition = yaml.safe_load(file)

  # Create the task
  task = client.tasks.create(
    agent_id=agent.id,
    **task_definition
  )

  # Create the execution
  execution = client.executions.create(
    task_id=task.id,
    input={"url": "https://en.wikipedia.org/wiki/Artificial_intelligence, "reducing_strength": 5}
  )
  # Wait for the execution to complete
  while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
      print(result.status)
      time.sleep(1)

  if result.status == "succeeded":
      print(result.output)
  else:
      print(f"Error: {result.error}")
  ```

  ```js Node.js [expandable] theme={"dark"}
  import { Julep } from '@julep/sdk';
  import fs from 'fs';
  import yaml from 'yaml';

  // Initialize the client
  const client = new Julep({
    apiKey: 'your_julep_api_key'
  });

  // Create the agent
  const agent = await client.agents.create({
    name: "Julep Crawler Agent",
    about: "A Julep agent that can crawl a website and store the content in the document store.",
  });

  // Load the task definition
  const taskDefinition = yaml.parse(fs.readFileSync('crawling_task.yaml', 'utf8'));

  // Create the task
  const task = await client.tasks.create(
    agent.id,
    taskDefinition
  );

  // Create the execution
  const execution = await client.executions.create(
    task.id,
    {
      input: { 
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "reducing_strength": 5
      }
    }
  );

  // Wait for the execution to complete
  let result;
  while (true) {
    result = await client.executions.get(execution.id);
    if (result.status === 'succeeded' || result.status === 'failed') break;
    console.log(result.status);
    await new Promise(resolve => setTimeout(resolve, 1000));
  }

  // Print the result
  if (result.status === 'succeeded') {
    console.log(result.output);
  } else {
    console.error(`Error: ${result.error}`);
  }
  ```
</CodeGroup>

Next, create a session for the agent. This session will be used to chat with the agent.

<CodeGroup>
  ```python Python theme={"dark"}
  session = client.sessions.create(
    agent_id=AGENT_ID
  )
  ```

  ```js Node.js theme={"dark"}
  const session = await client.sessions.create({
    agentId: 'YOUR_AGENT_ID'
  });
  ```
</CodeGroup>

Finally, chat with the agent.

<CodeGroup>
  ```python Python theme={"dark"}
  response = client.sessions.chat(
    session_id=session.id,
    messages=[
      {
        "role": "user",
        "content": "tell me about artificial intelligence"
      }
    ]
  )

  print(response)
  ```

  ```js Node.js theme={"dark"}
  const response = await client.sessions.chat({
    sessionId: 'YOUR_SESSION_ID',
    messages: [{ role: 'user', content: 'tell me about artificial intelligence' }]
  });
  ```
</CodeGroup>

## Example Output

This is an example output when the agent is asked "What is Julep?"

<Accordion title="Output">
  Julep is a comprehensive platform designed for creating production-ready AI systems and agents. Here are the key aspects of Julep:

  Core Features:

  1. Complete Infrastructure Layer

  * Provides infrastructure between LLMs and software
  * Built-in support for long-term memory
  * Multi-step process management
  * State management capabilities

  2. AI Agent Development

  * Creates persistent AI agents that remember past interactions
  * Supports complex task execution
  * Enables multi-step workflows
  * Includes built-in tools and integrations

  3. Production-Ready Features

  * Automatic retries for failed steps
  * Message resending capabilities
  * Task recovery systems
  * Real-time monitoring
  * Error handling
  * Automatic scaling and load balancing

  4. Development Approach

  * Uses 8-Factor Agent methodology
  * Treats prompts as code with proper versioning
  * Provides clear tool interfaces
  * Offers model independence to avoid vendor lock-in
  * Includes structured reasoning capabilities
  * Maintains ground truth examples for validation

  Available Resources:

  * Documentation: [https://docs.julep.ai/](https://docs.julep.ai/)
  * API Playground: [https://dev.julep.ai/api/docs](https://dev.julep.ai/api/docs)
  * Python SDK: [https://github.com/julep-ai/python-sdk/blob/main/README.md](https://github.com/julep-ai/python-sdk/blob/main/README.md)
  * JavaScript SDK: [https://github.com/julep-ai/node-sdk/blob/main/README.md](https://github.com/julep-ai/node-sdk/blob/main/README.md)
  * Various use case examples and cookbooks

  You can explore different use cases through their cookbooks, including:

  * User Profiling
  * Email Assistant
  * Trip Planner
  * Document Management
  * Website Crawler
  * Multi-step Tasks
  * Advanced Chat Interactions

  For additional support or to learn more:

  * Discord Community: [https://discord.gg/2EUJzJU2Yt](https://discord.gg/2EUJzJU2Yt)
  * Book a Demo: [https://calendly.com/ishita-julep](https://calendly.com/ishita-julep)
  * Dev Support: [hey@julep.ai](mailto:hey@julep.ai)
</Accordion>

## Next Steps

* Try this task yourself, check out the full example, see the [RAG Chatbot cookbook](https://github.com/julep-ai/julep/blob/main/cookbooks/advanced/08-rag-chatbot.ipynb).
* To learn more about the integrations used in this task, check out the [integrations](/integrations/supported-integrations) page.

## Related Concepts

* [Agents](/concepts/agents)
* [Tasks](/concepts/tasks)
* [Tools](/concepts/tools)
