Browser Use

Overview

This tutorial demonstrates how to:

Set up browser automation with Julep
Navigate web pages programmatically
Execute browser actions like clicking and typing
Process visual feedback through screenshots
Create goal-oriented browser automation tasks

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    goal:
      type: string
    agent_id:
      type: string
      description: The id of the agent to use for the browser automation
  required:
    - goal
    - agent_id

This schema specifies that our task expects a goal string describing what the browser automation should accomplish.

2. Tools Configuration

Next, we define the external tools our task will use:

tools:
- name: create_browserbase_session
  type: integration
  integration:
    provider: browserbase
    method: create_session
    setup:
      api_key: YOUR_BROWSERBASE_API_KEY
      project_id: YOUR_PROJECT_ID

- name: get_session_view_urls
  type: integration
  integration:
    provider: browserbase
    method: get_live_urls

- name: perform_browser_action
  type: integration
  integration:
    provider: remote_browser
    method: perform_action
    setup:
      width: 1024
      height: 768

- name: create_julep_session
  type: system
  system:
    resource: session
    operation: create

- name: session_chat
  type: system
  system:
    resource: session
    operation: chat

3. Main Workflow Steps

Create Julep Session

- tool: create_julep_session
  arguments:
    agent: $ str(agent.id)
    situation: "The environment is a browser"
    recall: 'False'

This step initializes a new Julep session for the AI agent. The session serves as a container for the conversation history and enables the agent to maintain context throughout the interaction.

Store Session ID

- evaluate:
    julep_session_id: $ _.id

After creating the session, we store its unique identifier for future reference.

Create Browser Session

- tool: create_browserbase_session
  arguments:
    project_id: YOUR_PROJECT_ID

This step establishes a new browser session using BrowserBase. It creates an isolated, headless Chrome browser instance that the agent can control.

Store Browser Session Info

- evaluate:
    browser_session_id: $ _.id
    connect_url: $ _.connect_url

We store both the browser session ID and connect URL in a single evaluation step.

Get Session View URLs

- tool: get_session_view_urls
  arguments:
    id: $ _.browser_session_id

This step retrieves various URLs associated with the browser session, including debugging interfaces and live view URLs.

Store Debugger URL

- evaluate:
    debugger_url: $ _.urls.debuggerUrl

We specifically store the debugger URL, which provides access to Chrome DevTools Protocol debugging interface.

Initial Navigation

- tool: perform_browser_action
  arguments:
    connect_url: $ steps[3].output.connect_url
    action: "navigate"
    text: "https://www.google.com"

This step navigates to Google’s homepage to avoid sending a blank screenshot when computer use starts.

Start Browser Workflow

- workflow: run_browser
  arguments:
    julep_session_id: $ steps[1].output.julep_session_id
    cdp_url: $ steps[3].output.connect_url
    messages:
    - role: "user"
      content: |-
        $ f"""
        <SYSTEM_CAPABILITY>
        * You are utilising a headless chrome browser to interact with the internet.
        * You can use the computer tool to interact with the browser.
        * You have access to only the browser.
        * You are already inside the browser.
        * You can't open new tabs or windows.
        * For now, rely on screenshots as the only way to see the browser.
        * You can't don't have access to the browser's UI.
        * YOU CANNOT WRITE TO THE SEARCH BAR OF THE BROWSER.
        </SYSTEM_CAPABILITY>
        <GOAL>
        * + {steps[0].input.goal} + NEWLINE + </GOAL> """

Finally, we initiate the interactive browser workflow with system capabilities and user goal.

4. Run Browser Subworkflow

The run_browser subworkflow is a crucial component that handles the interactive browser automation. It consists of three main parts:

Agent Interaction

- tool: session_chat
  arguments:
    session_id: $ _.julep_session_id
    messages: $ _.messages
    recall: $ False

- evaluate:
    content: $ _.choices[0].message.content
    tool_calls: |-
      $ [ 
        { 
          'tool_call_id': tool_call.id, 
          'action': load_json(tool_call.function.arguments)['action'], 
          'text': load_json(tool_call.function.arguments).get('text'), 
          'coordinate': load_json(tool_call.function.arguments).get('coordinate') 
        } 
        for tool_call in _.choices[0].message.tool_calls or [] if tool_call.type == 'function']

This step engages the AI agent in conversation, allowing it to:

Process and understand the user’s goal
Plan appropriate browser actions
Generate responses based on the current browser state
Make decisions about next steps

Action Execution

- foreach:
    in: $ _.tool_calls
    do:
      tool: perform_browser_action
      arguments:
        connect_url: $ steps[0].input.cdp_url
        action: $ _.action
        text: $ _.get('text')
        coordinate: $ _.get('coordinate')

This component:

Iterates through planned actions sequentially
Executes browser commands (navigation, clicking, typing)
Handles different types of interactions (text input, mouse clicks)
Captures screenshots for visual feedback

Goal Evaluation

- evaluate:
    contents: >-
      $ [ \
        { \
          'type': 'image_url', \
          'image_url': { \
            'url': result['base64_image'], \
          } \
        } if result['base64_image'] is not None else \
        { \
          'type': 'text', \
          'text': result['output'] if result['output'] is not None else 'done' \
        } \
        for result in _]

- evaluate:
    messages: "$ [{'content': [_.contents[i]], 'role': 'tool', 'name': 'computer', 'tool_call_id': steps[1].output.tool_calls[i].tool_call_id} for i in range(len(_.contents))]"

- workflow: check_goal_status
  arguments:
    messages: $ _.messages
    julep_session_id: $ steps[0].input.julep_session_id
    cdp_url: $ steps[0].input.cdp_url

This final part:

Assesses progress toward the user’s goal
Determines if additional actions are needed
Maintains conversation context
Decides whether to continue or conclude the workflow

5. Check Goal Status Subworkflow

The check_goal_status subworkflow is a recursive component that ensures continuous operation until the goal is achieved:

Check Goal Status

check_goal_status:
- if: $ len(_.messages) > 0
  then:
    workflow: run_browser
    arguments:
      messages: $ _.messages
      julep_session_id: $ _.julep_session_id
      cdp_url: $ _.cdp_url

This workflow:

Checks if there are any messages to process (len(_.messages) > 0)
If messages exist, recursively calls the run_browser workflow
Passes along the current session context and connection details
Maintains the conversation flow until the goal is achieved
Automatically terminates when no more messages need processing

This recursive pattern ensures that the browser automation continues until either:

The goal is successfully achieved
No more actions are needed
An error occurs that prevents further progress

Complete Task YAML

YAML

# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json
name: Julep Browser Use Task
description: A Julep agent that can use the computer tool to interact with the browser.

########################################################
################### INPUT SCHEMA #######################
########################################################

input_schema:
  type: object
  properties:
    goal:
      type: string
  required:
    - goal

########################################################
####################### TOOLS ##########################
########################################################

tools:
- name: create_browserbase_session
  type: integration
  integration:
    provider: browserbase
    method: create_session
    setup:
      api_key: "YOUR_BROWSERBASE_API_KEY"
      project_id: "YOUR_BROWSERBASE_PROJECT_ID"

- name: get_session_view_urls
  type: integration
  integration:
    provider: browserbase
    method: get_live_urls
    setup:
      api_key: "YOUR_BROWSERBASE_API_KEY"
      project_id: "YOUR_BROWSERBASE_PROJECT_ID"

- name: perform_browser_action
  type: integration
  integration:
    provider: remote_browser
    method: perform_action
    setup:
      width: 1024
      height: 768

- name: create_julep_session
  type: system
  system:
    resource: session
    operation: create

- name: session_chat
  type: system
  system:
    resource: session
    operation: chat

########################################################
################### MAIN WORKFLOW ######################
########################################################

main:

# Step #0 - Create Julep Session
- tool: create_julep_session
  arguments:
    agent: $ str(agent.id)
    situation: "Juelp Browser Use Agent"
    recall: 'False'

# Step #1 - Store Julep Session ID
- evaluate:
    julep_session_id: $ _.id

# Step #2 - Create Browserbase Session
- tool: create_browserbase_session
  arguments:
    project_id: "c35ee022-883e-4070-9f3c-89607393214b"

# Step #3 - Store Browserbase Session Info
- evaluate:
    browser_session_id: $ _.id
    connect_url: $ _.connect_url

# Step #4 - Get Session View URLs
- tool: get_session_view_urls
  arguments:
    id: $ _.browser_session_id

# Step #5 - Store Debugger URL
- evaluate:
    debugger_url: $ _.urls.debuggerUrl

# Step #6 - Navigate to Google
# Navigate to google to avoid sending a blank 
# screenshot when computer use starts
- tool: perform_browser_action
  arguments:
    connect_url: $ steps[3].output.connect_url
    action: "navigate"
    text: "https://www.google.com"

# Step #7 - Run Browser Workflow
- workflow: run_browser
  arguments:
    julep_session_id: $ steps[1].output.julep_session_id
    cdp_url: $ steps[3].output.connect_url
    messages:
    - role: "user"
      content: |-
        $ f"""
        <SYSTEM_CAPABILITY>
        * You are utilising a headless chrome browser to interact with the internet.
        * You can use the computer tool to interact with the browser.
        * You have access to only the browser.
        * You are already inside the browser.
        * You can't open new tabs or windows.
        * For now, rely on screenshots as the only way to see the browser.
        * You don't have access to the browser's UI.
        * YOU CANNOT WRITE TO THE SEARCH BAR OF THE BROWSER.
        </SYSTEM_CAPABILITY>
        <GOAL>
        * + {steps[0].input.goal} + NEWLINE + </GOAL> """


########################################################
################# RUN BROWSER SUBWORKFLOW #################
########################################################

run_browser:

# Step #0 - Agent Interaction
- tool: session_chat
  arguments:
    session_id: $ _.julep_session_id
    messages: $ _.messages
    recall: $ False

# Step #1 - Evaluate the response from the agent
- evaluate:
    content: $ _.choices[0].message.content
    tool_calls: |-
      $ [ 
        { 
          'tool_call_id': tool_call.id, 
          'action': load_json(tool_call.function.arguments)['action'], 
          'text': load_json(tool_call.function.arguments).get('text'), 
          'coordinate': load_json(tool_call.function.arguments).get('coordinate') 
        } 
        for tool_call in _.choices[0].message.tool_calls or [] if tool_call.type == 'function']

# Step #2 - Perform the actions requested by the agent
- foreach:
    in: $ _.tool_calls
    do:
      tool: perform_browser_action
      arguments:
        connect_url: $ steps[0].input.cdp_url
        action: $ _.action if not (str(_.get('text', '')).startswith('http') and _.action == 'type') else 'navigate'
        text: $ _.get('text')
        coordinate: $ _.get('coordinate')

# Step #3 - Convert the result of the actions into a chat message
- evaluate:
    contents: >-
      $ [ \
        { \
          'type': 'image_url', \
          'image_url': { \
            'url': result['base64_image'], \
          } \
        } if result['base64_image'] is not None else \
        { \
          'type': 'text', \
          'text': result['output'] if result['output'] is not None else 'done' \
        } \
        for result in _]

# Step #4 - Convert the result of the actions into a chat message
- evaluate:
    messages: "$ [{'content': [_.contents[i]], 'role': 'tool', 'name': 'computer', 'tool_call_id': steps[1].output.tool_calls[i].tool_call_id} for i in range(len(_.contents))]"

# Step #5 - Check if the goal is achieved and recursively run the browser
- workflow: check_goal_status
  arguments:
    messages: $ _.messages
    julep_session_id: $ steps[0].input.julep_session_id
    cdp_url: $ steps[0].input.cdp_url


########################################################
############## CHECK GOAL STATUS SUBWORKFLOW ##############
########################################################

check_goal_status:

# Step #0 - Check if the goal is achieved and recursively run the browser
- if: $ len(_.messages) > 0
  then:
    workflow: run_browser
    arguments:
      messages: $ _.messages
      julep_session_id: $ _.julep_session_id
      cdp_url: $ _.cdp_url

Usage

Here’s how to use this task with the Julep SDK:

from julep import Client
import yaml
import time

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

# Create the agent
agent = client.agents.create(
  name="Julep Browser Use Agent",
  about="A Julep agent that can use the computer tool to interact with the browser.",
)

# Load the task definition
with open('browser_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
    task_id=task.id,
    input={
        "agent_id": agent.id,
        "goal": "Search for recent news about artificial intelligence"
    }
)

# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(1)

# Print the result
if result.status == "succeeded":
    print(result.output)
else:
    print(f"Error: {result.error}")

Key Features

Browser Automation: Performs web interactions like navigation, clicking, and typing
Visual Feedback: Captures screenshots to verify actions and understand page state
Goal-Oriented: Continues executing actions until the user’s goal is achieved
Secure Sessions: Uses BrowserBase for isolated browser instances
Interactive Workflow: Uses run_browser subworkflow for continuous interaction

Next Steps

Try this task yourself, check out the full example, see the browser-use cookbook.
To learn more about the integrations used in this task, check out the integrations page.

Get Started

Julep CLI

Core Concepts

Tutorials

Integrations

Advanced Topics

Overview

Task Structure

1. Input Schema

2. Tools Configuration

3. Main Workflow Steps

4. Run Browser Subworkflow

5. Check Goal Status Subworkflow

Usage

Key Features

Next Steps

Get Started

Julep CLI

Core Concepts

Tutorials

Integrations

Advanced Topics

​Overview

​Task Structure

​1. Input Schema

​2. Tools Configuration

​3. Main Workflow Steps

​4. Run Browser Subworkflow

​5. Check Goal Status Subworkflow

​Usage

​Key Features

​Next Steps

​Related Concepts

Overview

Task Structure

1. Input Schema

2. Tools Configuration

3. Main Workflow Steps

4. Run Browser Subworkflow

5. Check Goal Status Subworkflow

Usage

Key Features

Next Steps

Related Concepts