Overview

In this section, we’ll cover the key concepts and components of the Julep Responses API. The Julep Responses API is designed to be compatible with OpenAI’s interface, making it easy to migrate existing applications that use OpenAI’s API to Julep.

  • The Open Responses API requires self-hosting. See the installation guide below.
  • Being in Alpha, the API is subject to change. Check back frequently for updates.
  • For more context, see the OpenAI Responses API documentation.

Components

The Responses API offers a streamlined way to interact with language models with the following key components:

  • Response ID: A unique identifier (uuid7) for each response.
  • Model: The language model used to generate the response (e.g., “claude-3.5-haiku”, “gpt-4o”, etc.).
  • Input: The prompt or question sent to the model, which can be simple text or structured input.
  • Output: The generated content from the model, which can include text, tool outputs, or other structured data.
  • Status: The current status of the response (completed, failed, in_progress, incomplete).
  • Tools: Optional tools that the model can use to enhance its response.
  • Usage: Token consumption metrics for the response.

2.1. Response Configuration Options

When creating a response, you can leverage these configuration options to tailor the experience:

OptionTypeDescriptionDefaultStatus
modelstringThe language model to use (e.g., “claude-3.5-haiku”, “gpt-4o”). Check out the supported models for more information.RequiredImplemented
inputstring | arrayThe prompt or structured input to send to the modelRequiredImplemented
includearray | nullTypes of content to include in the response (e.g., “file_search_call.results”)NonePartially Implemented
parallel_tool_callsbooleanWhether to allow tools to be called in paralleltrueImplemented
storebooleanWhether to store the response for later retrievaltrueImplemented
streambooleanWhether to stream the response as it’s generatedfalsePlanned
max_tokensinteger | nullMaximum number of tokens to generateNoneImplemented
temperaturenumberControls randomness in response generation (0 to 1)1Implemented
top_pnumberControls diversity in token selection (0 to 1)1Implemented
ninteger | nullNumber of responses to generateNoneImplemented
stopstring | array | nullSequence(s) where the model should stop generatingNoneImplemented
presence_penaltynumber | nullPenalty for new tokens based on presence in text so farNoneImplemented
frequency_penaltynumber | nullPenalty for new tokens based on frequency in text so farNoneImplemented
logit_biasobject | nullModify likelihood of specific tokens appearingNoneImplemented
userstring | nullUnique identifier for the end-userNoneImplemented
instructionsstring | nullAdditional instructions to guide the model’s responseNoneImplemented
previous_response_idstring | nullID of a previous response for context continuityNoneImplemented
reasoningobject | nullControls reasoning effort (low/medium/high)NoneImplemented
textobject | nullConfigures text format (text or JSON object)NoneImplemented
tool_choice"auto" | "none" | object | nullControls how the model chooses which tools to useNoneImplemented
toolsarray | nullList of tools the model can use for generating the responseNonePartially Implemented
truncation"disabled" | "auto" | nullHow to handle context overflowNonePlanned
metadataobject | nullAdditional metadata for the responseNoneImplemented

To know more about the roadmap of the Responses API, check out the Roadmap page.

Input Formats

The Responses API supports various input formats to accommodate different use cases:

Simple Text Input

The simplest way to interact with the Responses API is to provide a text string as input:

{
  "input": "What are the top 5 skincare products?"
}

Structured Message Input

For more complex interactions, you can provide a structured array of messages:

{
  "input": [
    {
      "role": "user",
      "content": "Please summarize the current market trends in renewable energy."
    }
  ]
}

Multi-modal Input

The Responses API supports multi-modal inputs, allowing you to include images or files along with text:

{
  "input": [
    {
      "role": "user",
      "content": [
          {"type": "input_text", "text": "what is in this image?"},
          {
              "type": "input_image",
              "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
      ]
    }
  ]
}

Tool Usage

The Responses API supports tool usage, allowing the model to perform actions like web searches, function calls, and more to enhance its response.

{
  "input": "What are the latest advancements in quantum computing?",
  "tools": [
    {
      "type": "web_search_preview",
      "domains": ["https://www.google.com"],
      "search_context_size": "small",
      "user_location": { 
        "type": "approximate",
        "city": "YOUR_CITY",
        "country": "YOUR_COUNTRY",
        "region": "YOUR_REGION",
        "timezone": "YOUR_TIMEZONE"
      }
    }
  ],
}

Relationship to Sessions

While Sessions provide a persistent, stateful way to interact with agents over multiple turns, the Responses API offers a lightweight, stateless alternative for quick, one-off interactions with language models. Here’s how they compare:

FeatureSessionsResponses
State ManagementMaintains conversation historyStateless (with optional context from previous responses)
PersistenceLong-lived, for ongoing conversationsShort-lived, for one-off interactions
Agent IntegrationRequires an agentNo agent needed
Setup ComplexityRequires agent and session creationMinimal setup (just model and input)
Use CaseMulti-turn conversations, complex interactionsQuick content generation, processing, or reasoning

If you need to maintain context across multiple interactions but prefer the simplicity of the Responses API, you can use the previous_response_id parameter to link responses together.

Response Object Structure

The Response object is the core data structure returned by the Julep Responses API as a response to a request. It contains all the information about a generated response. It follows the OpenAI Responses API. Following is the schema of the Response object:

FieldTypeDescription
idstringUnique identifier for the response
objectstringAlways “response”
created_atintegerUnix timestamp when the response was created
statusstringCurrent status: “completed”, “failed”, “in_progress”, or “incomplete”
errorobject or nullError information if the response failed
incomplete_detailsobject or nullDetails about why a response is incomplete
instructionsstring or nullOptional instructions provided to the model
max_output_tokensinteger or nullMaximum number of tokens to generate
modelstringThe model used to generate the response
outputarrayList of output items (messages, tool calls, reasoning)
parallel_tool_callsbooleanWhether tools can be called in parallel
previous_response_idstring or nullID of a previous response for context
reasoningobject or nullReasoning steps if reasoning was requested
storebooleanWhether the response is stored for later retrieval
temperaturenumberSampling temperature used (0-1)
textobject or nullText formatting options
tool_choicestring or objectHow tools are selected (“auto”, “none”, “required”)
toolsarrayList of tools available to the model
top_pnumberTop-p sampling parameter (0-1)
truncationstringTruncation strategy (“disabled” or “auto”)
usageobjectToken usage statistics
userstring or nullOptional user identifier
metadataobjectCustom metadata associated with the response

The output array contains the actual content generated by the model, which can include text messages, tool calls (function, web search, file search, computer), and reasoning items.

Best Practices

Optimize Input Prompts

  • 1. Be Specific: Clearly define what you want the model to generate.
  • 2. Provide Context: Include relevant background information in your prompt.
  • 3. Use Examples: When appropriate, include examples of desired outputs in your prompt.

Model Selection

  • 1. Match Complexity: Use more capable models for complex tasks (e.g., reasoning, coding).
  • 2. Consider Latency: Smaller models are faster for simple tasks.
  • 3. Test Different Models: Compare results across models for optimal performance.

Tool Usage

  • 1. Provide Clear Tool Descriptions: Help the model understand when and how to use tools.
  • 2. Only Include Relevant Tools: Too many tools can confuse the model’s selection process.
  • 3. Validate Tool Outputs: Always verify the information returned from tool calls.

Next Steps