The _ variable refers to the current context object. When accessing properties like _['url'], itβs retrieving values from the input parameters passed to the task.
This step:
Takes the input URL and crawls the website
Processes content into readable markdown format
Chunks content into manageable segments
Filters out unnecessary elements like images and SVGs
2
Process and Index Content
Copy
- evaluate:documents: $ _['content']-over: $ "[(steps[0].input.content, chunk) for chunk in _['documents']]"parallelism:3 map: prompt:-role: usercontent:>- $ f''' <document>{_[0]} </document> <chunk>{_[1]} </chunk> Please give a short succinct context to situate this chunk within the overall document.'''
This step:
Processes each content chunk in parallel
Generates contextual metadata for improved retrieval
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.jsonname: Julep Customer Support RAG Taskdescription: A Julep agent that can crawl a website and store the content in the document store.########################################################################### INPUT SCHEMA ###############################################################################input_schema:type: object properties: url:type: string pages_limit:type: integer########################################################################### TOOLS ######################################################################################tools:-name: spider_crawlertype: integration integration:provider: spidermethod: crawl setup:spider_api_key:"YOUR_SPIDER_API_KEY"-name: create_agent_docdescription: Create an agent doctype: system system:resource: agentsubresource: docoperation: create########################################################################### INDEX PAGE SUBWORKFLOW ######################################################################index_page:# Step #0 - Evaluate the content- evaluate:documents: $ _['content']# Step #1 - Process each content chunk in parallel-over: $ "[(steps[0].input.content, chunk) for chunk in _['documents']]"parallelism:3 map:prompt:-role: usercontent:>- $ f''' <document>{_[0]} </document> Here is the chunk we want to situate within the whole document <chunk>{_[1]} </chunk> Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.'''unwrap:true settings:max_tokens:16000# Step #2 - Create a new document and add it to the agent docs store- evaluate:final_chunks:| $ [ NEWLINE.join([chunk, succint]) for chunk, succint in zip(steps[1].input.documents, _)]# Step #3 - Create a new document and add it to the agent docs store-over: $ _['final_chunks']parallelism:3 map:tool: create_agent_doc arguments:agent_id:"00000000-0000-0000-0000-000000000000"# <--- This is the agent id of the agent you want to add the document to data: metadata:source:"spider_crawler"title:"Website Document"content: $ _########################################################################### MAIN WORKFLOW ##############################################################################main:# Step #0 - Define a tool call step that calls the spider_crawler tool with the url input-tool: spider_crawler arguments:url: $ _['url']# You can also use 'steps[0].input['url']' params:request:"smart_mode"limit: $ _['pages_limit']# <--- This is the number of pages to crawl (taken from the input of the task)return_format:"markdown"proxy_enabled:"True"filter_output_images:"True"# <--- This is to execlude images from the outputfilter_output_svg:"True"# <--- This is to execlude svg from the output# filter_output_main_only: "True"readability:"True"# <--- This is to make the output more readablesitemap:"True"# <--- This is to crawl the sitemapchunking_alg:# <--- Using spider's bysentence algorithm to chunk the outputtype:"bysentence"value:"15"# <--- This is the number of sentences per chunk# Step #1 - Evaluate step to document chunks- foreach:in: $ _['result'] do:workflow: index_page arguments:content: $ _.content
Start by creating an execution for the task. This execution will make the agent crawl the website and store the content in the document store.
Copy
from julep import Clientimport timeimport yaml# Initialize the clientclient = Client(api_key=JULEP_API_KEY)# Create the agentagent = client.agents.create( name="Julep Crawler Agent", description="A Julep agent that can crawl a website and store the content in the document store.",)# Load the task definitionwithopen('crawling_task.yaml','r')asfile: task_definition = yaml.safe_load(file)# Create the tasktask = client.tasks.create( agent_id=agent.id,**task_definition)# Create the executionexecution = client.executions.create( task_id=task.id,input={"url":"https://julep.ai/","pages_limit":5})# Wait for the execution to completewhile(result := client.executions.get(execution.id)).status notin['succeeded','failed']:print(result.status) time.sleep(1)if result.status =="succeeded":print(result.output)else:print(f"Error: {result.error}")
Next, create a session for the agent. This session will be used to chat with the agent.