Overview

This tutorial demonstrates how to:

  • Set up browser automation with Julep
  • Navigate web pages programmatically
  • Execute browser actions like clicking and typing
  • Process visual feedback through screenshots
  • Create goal-oriented browser automation tasks

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    goal:
      type: string
    agent_id:
      type: string
      description: The id of the agent to use for the browser automation
  required:
    - goal
    - agent_id

This schema specifies that our task expects a goal string describing what the browser automation should accomplish.

2. Tools Configuration

Next, we define the external tools our task will use:

tools:
- name: create_browserbase_session
  type: integration
  integration:
    provider: browserbase
    method: create_session
    setup:
      api_key: YOUR_BROWSERBASE_API_KEY
      project_id: YOUR_PROJECT_ID

- name: get_session_view_urls
  type: integration
  integration:
    provider: browserbase
    method: get_live_urls

- name: perform_browser_action
  type: integration
  integration:
    provider: remote_browser
    method: perform_action
    setup:
      width: 1024
      height: 768

- name: create_julep_session
  type: system
  system:
    resource: session
    operation: create

- name: session_chat
  type: system
  system:
    resource: session
    operation: chat

3. Main Workflow Steps

1

Create Julep Session

- tool: create_julep_session
  arguments:
    agent: str(agent.id)
    situation: "'The environment is a browser'"
    recall: 'False'

This step initializes a new Julep session for the AI agent. The session serves as a container for the conversation history and enables the agent to maintain context throughout the interaction.

2

Store Session ID

- evaluate:
    julep_session_id: _.id

After creating the session, we store its unique identifier for future reference.

3

Create Browser Session

- tool: create_browserbase_session
  arguments:
    project_id: YOUR_PROJECT_ID

This step establishes a new browser session using BrowserBase. It creates an isolated, headless Chrome browser instance that the agent can control.

4

Store Browser Session Info

- evaluate:
    browser_session_id: _.id
    connect_url: _.connect_url

We store both the browser session ID and connect URL in a single evaluation step.

5

Get Session View URLs

- tool: get_session_view_urls
  arguments:
    id: _.browser_session_id

This step retrieves various URLs associated with the browser session, including debugging interfaces and live view URLs.

6

Store Debugger URL

- evaluate:
    debugger_url: _.urls.debuggerUrl

We specifically store the debugger URL, which provides access to Chrome DevTools Protocol debugging interface.

7

Initial Navigation

- tool: perform_browser_action
  arguments:
    connect_url: outputs[3].connect_url
    action: "'navigate'"
    text: "'https://www.google.com'"

This step navigates to Google’s homepage to avoid sending a blank screenshot when computer use starts.

8

Start Browser Workflow

- workflow: run_browser
  arguments:
    julep_session_id: outputs[1].julep_session_id
    cdp_url: outputs[3].connect_url
    messages:
    - role: "'user'"
      content: |-
        """
        <SYSTEM_CAPABILITY>
        * You are utilising a headless chrome browser to interact with the internet.
        * You can use the computer tool to interact with the browser.
        * You have access to only the browser.
        * You are already inside the browser.
        * You can't open new tabs or windows.
        * For now, rely on screenshots as the only way to see the browser.
        * You can't don't have access to the browser's UI.
        * YOU CANNOT WRITE TO THE SEARCH BAR OF THE BROWSER.
        </SYSTEM_CAPABILITY>
        <GOAL>
        *""" + inputs[0].goal + NEWLINE + "</GOAL>"

Finally, we initiate the interactive browser workflow with system capabilities and user goal.

Run Browser Workflow

The run_browser workflow is a crucial component that handles the interactive browser automation. It consists of three main parts:

1

Agent Interaction

- tool: session_chat
  arguments:
    session_id: _.julep_session_id
    messages: _.messages
    recall: 'False'

This step engages the AI agent in conversation, allowing it to:

  • Process and understand the user’s goal
  • Plan appropriate browser actions
  • Generate responses based on the current browser state
  • Make decisions about next steps
1

Action Execution

- foreach:
    in: _.tool_calls
    do:
      tool: perform_browser_action
      arguments:
        connect_url: inputs[0].cdp_url
        action: _.action
        text: _.get('text')
        coordinate: _.get('coordinate')

This component:

  • Iterates through planned actions sequentially
  • Executes browser commands (navigation, clicking, typing)
  • Handles different types of interactions (text input, mouse clicks)
  • Captures screenshots for visual feedback
1

Goal Evaluation

- workflow: check_goal_status
  arguments:
    messages: _.messages
    julep_session_id: _.julep_session_id
    cdp_url: _.cdp_url

This final part:

  • Assesses progress toward the user’s goal
  • Determines if additional actions are needed
  • Maintains conversation context
  • Decides whether to continue or conclude the workflow

Check Goal Status Workflow

The check_goal_status workflow is a recursive component that ensures continuous operation until the goal is achieved:

1

Check Goal Status

check_goal_status:
- if: len(_.messages) > 0
  then:
    workflow: run_browser
    arguments:
      messages: _.messages
      julep_session_id: _.julep_session_id
      cdp_url: _.cdp_url
      workflow_label: "'run_browser'"

This workflow:

  • Checks if there are any messages to process (len(_.messages) > 0)
  • If messages exist, recursively calls the run_browser workflow
  • Passes along the current session context and connection details
  • Maintains the conversation flow until the goal is achieved
  • Automatically terminates when no more messages need processing

This recursive pattern ensures that the browser automation continues until either:

  • The goal is successfully achieved
  • No more actions are needed
  • An error occurs that prevents further progress

Example Usage

Here’s how to use this task with the Julep SDK:

from julep import Client

client = Client(api_key=JULEP_API_KEY)

execution = client.executions.create(
    task_id=TASK_UUID,
    input={
        "agent_id": "YOUR_AGENT_ID",
        "goal": "Search for recent news about artificial intelligence"
    }
)

Key Features

  • Browser Automation: Performs web interactions like navigation, clicking, and typing
  • Visual Feedback: Captures screenshots to verify actions and understand page state
  • Goal-Oriented: Continues executing actions until the user’s goal is achieved
  • Secure Sessions: Uses BrowserBase for isolated browser instances
  • Interactive Workflow: Uses run_browser subworkflow for continuous interaction

Next Steps

Try this task yourself, check out the full example, see the browser-use cookbook.