Introduction

Julep’s Open Responses is a self-hosted, open-source implementation of OpenAI’s Responses API that works with any LLM backend. It provides a lightweight interface for generating content with Large Language Models (LLMs) without needing to create persistent agents or sessions.

To try it out, just run npx -y open-responses init (or uvx) and that’s it! :)

What is Open Responses?

Julep’s Open Responses lets you run your own server that is compatible with OpenAI’s Responses API, while giving you the freedom to use alternative models like:

  • Anthropic’s Claude
  • Alibaba’s Qwen
  • Deepseek R1
  • and many others …

It’s essentially a drop-in replacement that you control, with a permissive Apache-2.0 license. As an early release, we welcome your feedback and contributions to help improve it.

Open Responses API Overview

Why Open Responses?

  • Model Flexibility: Use any LLM backend without vendor lock-in, including local model deployment
  • Self-hosted & Private: Maintain full control over your deployment on your own infrastructure (cloud or on-premise)
  • Drop-in Compatibility: Seamlessly integrates with the official Agents SDK by simply pointing to your self-hosted URL
  • Easy Deployment: Quick setup via docker-compose or our CLI with minimal configuration
  • Built-in Tools: Automatic execution of tool calls (like web_search) using open & pluggable alternatives
  • The Open Responses API requires self-hosting. See the installation guide below.
  • Being in Alpha, the API is subject to change. Check back frequently for updates.
  • For more context, see the OpenAI Responses API documentation.

Local Installation

This section will guide you through the steps to set up the Julep’s Open Responses API.

Prerequisites

Install Docker

Installation

The Julep’s Open Responses API is a fully microservice-based architecture. It is fully dockerized and can be easily deployed on any infrastructure that supports Docker. There are two ways to install the API:

Docker Installation

1

Create a directory for the project

mkdir julep-responses-api
2

Navigate to the project directory

cd julep-responses-api
3

Download and edit the environment variables

wget https://u.julep.ai/responses-env.example -O .env

Edit the .env file with your own values.

4

Download the Docker Compose file

wget https://u.julep.ai/responses-compose.yaml -O docker-compose.yml

Download the file to the current directory with the name docker-compose.yml. This is the file that will be used to run the Docker containers.

5

Run the Docker containers

docker compose up --watch

This will start the containers in watch mode.

6

Verify that the containers are running

docker ps

CLI Installation

The CLI is a lightweight alternative to Docker for those who prefer not to use Docker directly.

Internally, it uses Docker to run the containers.

1

Install the CLI

You can install the CLI using several package managers:

# Using npx directly
npx open-responses

# Or install globally
npm install -g open-responses
2

Setup the Environment Variables

npx open-responses setup

Before using any commands, you must run the setup command

3

Run the CLI

npx open-responses start

This will start the API in watch mode

To learn more about the CLI one can use the checkout the CLI Documentation.

Quickstart Example

With the OpenAI SDK initialized, you can now use the Responses API to generate content.

API Key Configuration

  • RESPONSE_API_KEY is the API key that you set in the .env file.

Model Selection

  • While using models other than OpenAI, one might need to add the provider/ prefix to the model name.
  • For supported providers, see the LiteLLM Providers documentation.

Environment Setup

  • Add the relevant provider keys to the .env file to use their respective models.

1. Install the OpenAI SDK

pip install openai

2. Initialize the OpenAI client

from openai import OpenAI
openai_client = OpenAI(base_url="http://localhost:8080/", api_key="RESPONSE_API_KEY")

3. Generate a response

import os
from openai import OpenAI

openai_client = OpenAI(base_url="http://localhost:8080/", api_key=os.getenv("RESPONSE_API_KEY"))

response = openai_client.responses.create(
    model="gpt-4o-mini",
    input="How many people live in the world?"
)
print("Generated response:", response.output[0].content[0].text)

Next Steps

You’ve got Open Responses running – here’s what to explore next: