Overview

Julep provides a robust chat system with various features for dynamic interaction with agents. Here’s an overview of the key components and functionalities.

Features

1

Tool Integration

The chat API allows for the use of tools, enabling the agent to perform actions or retrieve information during the conversation.

2

Multi-agent Sessions

You can specify different agents within the same session using the agent parameter in the chat settings.

3

Response Formatting

Control the output format, including options for JSON responses with specific schemas.

4

Memory and Recall

Configure how the session accesses and stores conversation history and memories.

5

Document References

The API returns information about documents referenced during the interaction, useful for providing citations or sources.

  • To use the Document(RAG) with the chat API, you need to create a session with the recall_options parameter set to appropriate search parameters. To learn more about the recall_options parameter, check out the Session page.
  • To use the chat API, you need to create a session first. To learn more about the session object, check out the Session page.

Input Structure

  • Messages: An array of input messages representing the conversation so far.
  • Tools: (Advanced) Additional tools provided for this specific interaction.
  • Tool Choice: Specifies which tool the agent should use.
  • Memory Access: Controls how the session accesses history and memories.(recall parameter)
  • Additional Parameters: Various parameters to control the behavior of the chat. You can find more details in the Additional Parameters section.

Here’s an example of how a typical message object might be structured in a chat interaction:

Additional Parameters

ParameterTypeDescriptionDefault
streamboolIndicates if the server should stream the response as it’s generated.False
stoplist[str]Up to 4 sequences where the API will stop generating further tokens.[]
seedintIf specified, the system will make a best effort to sample deterministically for that particular seed value.None
max_tokensintThe maximum number of tokens to generate in the chat completion.None
logit_biasdict[str, float]Modify the likelihood of specified tokens appearing in the completion.None
response_formatstrResponse format (set to json_object to restrict output to JSON).None
agentUUIDAgent ID of the agent to use for this interaction. (Only applicable for multi-agent sessions)None
repetition_penaltyfloatNumber between 0 and 2.0. 1.0 is neutral and values larger than that penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.None
length_penaltyfloatNumber between 0 and 2.0. 1.0 is neutral and values larger than that penalize number of tokens generated.None
min_pfloatMinimum probability compared to leading token to be considered.None
frequency_penaltyfloatNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.None
presence_penaltyfloatNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.None
temperaturefloatWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.None
top_pfloatAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.1.0
recallboolWhether to use the document (RAG) search or notTrue
saveboolWhether this interaction should be stored in the session history or notTrue
rememberboolDISABLED: Whether this interaction should form new memories or not (will be enabled in a future release)False
modelstrThe model to use for the chat completion.None

Usage

Here’s an example of how to use the chat API in Julep using the SDKs:

To use the Chat endpint, you always have to create a session first.

# Create a session with custom recall options
client.sessions.create(
    agent=agent.id,
    user=user.id,
    recall=True,
    recall_options={
        "mode": "hybrid", # or "vector", "text"
        "num_search_messages": 4, # number of messages to search for documents
        "max_query_length": 1000, # maximum query length
        "alpha": 0.7, # weight to apply to BM25 vs Vector search results (ranges from 0 to 1)
        "confidence": 0.6, # confidence cutoff level (ranges from -1 to 1)
        "limit": 10, # limit of documents to return
        "lang": "en-US", # language to be used for text-only search
        "metadata_filter": {}, # metadata filter to apply to the search
        "mmr_strength": 0, # MMR Strength (ranges from 0 to 1)
    }
)

# Chat in the session
response = client.sessions.chat(
    session_id=session.id,
    messages=[
        {
            "role": "user",
            "content": "Tell me about Julep"
        }
    ],
    recall=True
)
print("Agent's response:", response.choices[0].message.content)
print("Searched Documents:", response.docs)

To learn more about the Session object, check out the Session page.

Check out the API reference or SDK reference (Python or JavaScript) for more details on different operations you can perform on sessions.

Response

  • Content-Type: application/json
  • Body: A MessageChatResponse object containing the full generated message(s)

Both types of responses include the following fields:

  • id: The unique identifier for the chat response
  • choices: An object of generated message completions containing:
    • role: The role of the message (e.g. “assistant”, “user”, etc.)
    • id: Unique identifier for the message
    • content: list of actual message content
    • created_at: Timestamp when the message was created
    • name: Optional name associated with the message
    • tool_call_id: Optional ID referencing a tool call
    • tool_calls: Optional list of tool calls made during message generation
    • created_at: When this resource was created as UTC date-time
    • docs: List of document references used for this request, intended for citation purposes
    • jobs: List of UUIDs for background jobs that may have been initiated as a result of this interaction
    • usage: Statistics on token usage for the completion request

Finish Reasons

stop

Natural stop point or provided stop sequence reached

length

Maximum number of tokens specified in the request was reached

content_filter

Content was omitted due to a flag from content filters

tool_calls

The model called a tool

Support

If you need help with further questions in Julep: