> ## Documentation Index
> Fetch the complete documentation index at: https://docs.julep.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Features in Julep

> Learn about the robust chat system and its various features for dynamic interaction with agents

## Overview

Julep provides a robust chat system with various features for dynamic interaction with agents. Here's an overview of the key components and functionalities.

## Features

<Steps>
  <Step title="Tool Integration">
    The chat API allows for the use of tools, enabling the agent to perform actions or retrieve information during the conversation.
  </Step>

  <Step title="Multi-agent Sessions">
    You can specify different agents within the same session using the `agent` parameter in the chat settings.
  </Step>

  <Step title="Response Formatting">
    Control the output format, including options for JSON responses with specific schemas.
  </Step>

  <Step title="Memory and Recall">
    Configure how the session accesses and stores conversation history and memories.
  </Step>

  <Step title="Document References">
    The API returns information about documents referenced during the interaction, useful for providing citations or sources.
  </Step>
</Steps>

<Info>
  <h4>Prerequisites for Using Chat API</h4>

  <ul>
    <li><strong>Session Creation</strong>: Before using the chat API, you must create a session first. Learn more about the session object on the <a href="/concepts/sessions">Session</a> page.</li>
    <li><strong>Document (RAG) Integration</strong>: To use Document (RAG) capabilities with the chat API, create a session with the <code>recall\_options</code> parameter configured with appropriate search parameters. For details on configuring <code>recall\_options</code>, see the <a href="/concepts/sessions#recall-options-rag-search">Session: Recall Options</a> documentation.</li>
  </ul>
</Info>

## Input Structure

* **Messages**: An array of input messages representing the conversation so far.
* **Tools**: (Advanced) Additional tools provided for this specific interaction.
* **Tool Choice**: Specifies which tool the agent should use.
* **Memory Access**: Controls how the session accesses history and memories.(`recall` parameter)
* **Additional Parameters**: Various parameters to control the behavior of the chat. You can find more details in the [Additional Parameters](#additional-parameters) section.

Here's an example of how a typical message object might be structured in a chat interaction:

<Accordion title="Message Object Structure">
  ```python Python theme={"dark"}
    """
    Attributes for the Message object:
        role (Literal["user", "assistant", "system", "tool"]): The role of the message sender.
        tool_call_id (str | None): Optional identifier for a tool call associated with this message.
        content (Annotated[str | list[str] | list[Content | ContentModel7 | ContentModel] | None, Field(...)]): The main content of the message, which can be a string, a list of strings, or a list of content models.
        name (str | None): Optional name associated with the message.
        tool_calls (list[ChosenFunctionCall | ChosenComputer20241022 | ChosenTextEditor20241022 | ChosenBash20241022] | None): List of tool calls generated during the message creation, if any.
    """
    # Example of a simple message structure
    messages = [{"role": "user", "content": "Your query here"}]
  ```

  <p>This object represents a message in the chat system, detailing the structure and types of data it can hold.</p>
</Accordion>

## Additional Parameters

| Parameter            | Type               | Description                                                                                                                                                                                                                                  | Default |
| -------------------- | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `stream`             | `bool`             | Indicates if the server should stream the response as it's generated.                                                                                                                                                                        | `False` |
| `stop`               | `list[str]`        | Up to 4 sequences where the API will stop generating further tokens.                                                                                                                                                                         | `[]`    |
| `seed`               | `int`              | If specified, the system will make a best effort to sample deterministically for that particular seed value.                                                                                                                                 | `None`  |
| `max_tokens`         | `int`              | The maximum number of tokens to generate in the chat completion.                                                                                                                                                                             | `None`  |
| `logit_bias`         | `dict[str, float]` | Modify the likelihood of specified tokens appearing in the completion.                                                                                                                                                                       | `None`  |
| `response_format`    | `str`              | Response format (set to `json_object` to restrict output to JSON).                                                                                                                                                                           | `None`  |
| `agent`              | `UUID`             | Agent ID of the agent to use for this interaction. (Only applicable for multi-agent sessions)                                                                                                                                                | `None`  |
| `repetition_penalty` | `float`            | Number between 0 and 2.0. 1.0 is neutral and values larger than that penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.                           | `None`  |
| `length_penalty`     | `float`            | Number between 0 and 2.0. 1.0 is neutral and values larger than that penalize number of tokens generated.                                                                                                                                    | `None`  |
| `min_p`              | `float`            | Minimum probability compared to leading token to be considered.                                                                                                                                                                              | `None`  |
| `frequency_penalty`  | `float`            | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.                                                   | `None`  |
| `presence_penalty`   | `float`            | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.                                                   | `None`  |
| `temperature`        | `float`            | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.                                                         | `None`  |
| `top_p`              | `float`            | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. | `1.0`   |
| `recall`             | `bool`             | Whether to use the document (RAG) search or not                                                                                                                                                                                              | `True`  |
| `save`               | `bool`             | Whether this interaction should be stored in the session history or not                                                                                                                                                                      | `True`  |
| `remember`           | `bool`             | DISABLED: Whether this interaction should form new memories or not (will be enabled in a future release)                                                                                                                                     | `False` |
| `model`              | `str`              | The model to use for the chat completion.                                                                                                                                                                                                    | `None`  |
| `metadata`           | `dict[str, Any]`   | Custom metadata that can be passed to the system template for dynamic behavior. See [System Templates](/advanced/system-templates) for details.                                                                                              | `None`  |
| `auto_run_tools`     | `bool`             | Whether to automatically execute tools and send the results back to the model (requires tools on the agent)                                                                                                                                  | `False` |
| `recall_tools`       | `bool`             | Whether to include tool requests and responses when recalling messages from history                                                                                                                                                          | `True`  |

## Usage

Here's an example of how to use the chat API in Julep using the SDKs:

<Info>
  To use the Chat endpint, you always have to create a session first.
</Info>

<CodeGroup>
  ```python Python [expandable] theme={"dark"}
  # Create a session with custom recall options
  client.sessions.create(
      agent=agent.id,
      user=user.id,
      recall=True,
      recall_options={
          "mode": "hybrid", # or "vector", "text"
          "num_search_messages": 4, # number of messages to search for documents
          "max_query_length": 1000, # maximum query length
          "alpha": 0.7, # weight to apply to BM25 vs Vector search results (ranges from 0 to 1)
          "confidence": 0.6, # confidence cutoff level (ranges from -1 to 1)
          "limit": 10, # limit of documents to return
          "lang": "en-US", # language to be used for text-only search
          "metadata_filter": {}, # metadata filter to apply to the search
          "mmr_strength": 0, # MMR Strength (ranges from 0 to 1)
      }
  )

  # Chat in the session
  response = client.sessions.chat(
      session_id=session.id,
      messages=[
          {
              "role": "user",
              "content": "Tell me about Julep"
          }
      ],
      recall=True
  )
  print("Agent's response:", response.choices[0].message.content)
  print("Searched Documents:", response.docs)

  # Chat with metadata for dynamic system prompts
  response_with_metadata = client.sessions.chat(
      session_id=session.id,
      messages=[
          {
              "role": "user",
              "content": "Help me with Python"
          }
      ],
      metadata={
          "language": "Spanish",
          "expertise_level": "beginner",
          "custom_instructions": "Use simple examples"
      }
  )
  # The metadata will be available in the system template as {{metadata.language}}, etc.

  # Chat with structured JSON response
  response_json = client.sessions.chat(
      session_id=session.id,
      messages=[
          {
              "role": "user",
              "content": "Analyze the sentiment of: 'I love using Julep, it makes AI development so much easier!'"
          }
      ],
      response_format={"type": "json_object"},
      # When using response_format, instruct the model to return JSON
      settings={
          "temperature": 0.3
      }
  )
  # Response will be valid JSON that can be parsed
  import json
  sentiment_data = json.loads(response_json.choices[0].message.content)
  ```

  ```javascript Node.js [expandable] theme={"dark"}
  client.sessions.create({
      agent: agent.id,
      user: user.id,
      recall: true,
      recall_options: {
          mode: "hybrid", // or "vector", "text"
          num_search_messages: 4, // number of messages to search for documents
          max_query_length: 1000, // maximum query length
          alpha: 0.7, // weight to apply to BM25 vs Vector search results (ranges from 0 to 1)
          confidence: 0.6, // confidence cutoff level (ranges from -1 to 1)
          limit: 10, // limit of documents to return
          lang: "en-US", // language to be used for text-only search
          metadata_filter: {}, // metadata filter to apply to the search
          mmr_strength: 0, // MMR Strength (ranges from 0 to 1)
      }
  });

  // Chat in the session
  const response = await client.sessions.chat({
      session_id: session.id,
      messages: [
          {
              role: "user",
              content: "Tell me about Julep"
          }
      ],
      recall: true
  });

  // Chat with metadata for dynamic system prompts
  const responseWithMetadata = await client.sessions.chat({
      session_id: session.id,
      messages: [
          {
              role: "user",
              content: "Help me with Python"
          }
      ],
      metadata: {
          language: "Spanish",
          expertise_level: "beginner",
          custom_instructions: "Use simple examples"
      }
  });
  // The metadata will be available in the system template as {{metadata.language}}, etc.

  // Chat with structured JSON response
  const responseJson = await client.sessions.chat({
      session_id: session.id,
      messages: [
          {
              role: "user",
              content: "Analyze the sentiment of: 'I love using Julep, it makes AI development so much easier!'"
          }
      ],
      response_format: { type: "json_object" },
      // When using response_format, instruct the model to return JSON
      settings: {
          temperature: 0.3
      }
  });
  // Response will be valid JSON that can be parsed
  const sentimentData = JSON.parse(responseJson.choices[0].message.content);
  ```
</CodeGroup>

To learn more about the Session object, check out the [Session](/concepts/sessions) page.

<Tip>
  Check out the [API reference](/api-reference/sessions/chat) or SDK reference ([Python](/sdks/python/reference#sessions) or [JavaScript](/sdks/nodejs/reference#sessions)) for more details on different operations you can perform on sessions.
</Tip>

## Response

<Tabs>
  <Tab title="Complete Response">
    * **Content-Type**: `application/json`
    * **Body**: A `MessageChatResponse` object containing the full generated message(s)
  </Tab>

  <Tab title="Streamed Response">
    * **Content-Type**: `text/event-stream`
    * **Body**: A stream of `ChatOutputChunk` objects sent as Server-Sent Events (SSE)

    To enable streaming, set `stream: true` in your chat request:

    <CodeGroup>
      ```python Python theme={"dark"}
      from julep import AsyncClient

      # Streaming requires async client
      async_client = AsyncClient(api_key="your-api-key")

      response = await async_client.sessions.chat(
          session_id=session.id,
          messages=[{"role": "user", "content": "Tell me a story"}],
          stream=True
      )

      async for chunk in response:
          if chunk.choices[0].delta.content:
              print(chunk.choices[0].delta.content, end="")
      ```

      ```javascript Node.js theme={"dark"}
      const response = await client.sessions.chat({
          session_id: session.id,
          messages: [{ role: "user", content: "Tell me a story" }],
          stream: true
      });

      for await (const chunk of response) {
          process.stdout.write(chunk.choices[0].delta.content);
      }
      ```
    </CodeGroup>

    <Note>
      Streaming is not available when `auto_run_tools=true`. The system will return an error if both options are enabled.
    </Note>
  </Tab>
</Tabs>

Both types of responses include the following fields:

* `id`: The unique identifier for the chat response
* `choices`: An object of generated message completions containing:
  * `role`: The role of the message (e.g. "assistant", "user", etc.)
  * `id`: Unique identifier for the message
  * `content`: list of actual message content
  * `created_at`: Timestamp when the message was created
  * `name`: Optional name associated with the message
  * `tool_call_id`: Optional ID referencing a tool call
  * `tool_calls`: Optional list of tool calls made during message generation
  * `created_at`: When this resource was created as UTC date-time
  * `docs`: List of document references used for this request, intended for citation purposes
  * `jobs`: List of UUIDs for background jobs that may have been initiated as a result of this interaction
  * `usage`: Statistics on token usage for the completion request

## Automatic Tool Calling

Julep supports automatic tool execution during chat interactions. This feature allows tools to be executed seamlessly without manual intervention, making conversations more fluid and responsive.

### How It Works

1. **When `auto_run_tools=true`**:
   * The model identifies when a tool should be used based on the conversation
   * The tool is automatically executed by Julep's backend
   * Results are fed back to the model to continue the conversation
   * The entire process happens in a single API call

2. **When `auto_run_tools=false` (default)**:
   * The model returns tool call requests in the response
   * Your application must execute the tools manually
   * Results need to be sent back in a follow-up message

### Example with Automatic Tool Execution

<CodeGroup>
  ```python Python theme={"dark"}
  # Agent with a weather tool
  agent = client.agents.create(
      name="Weather Assistant",
      tools=[
          {
              "name": "get_weather",
              "type": "integration",
              "integration": {"provider": "weather"}
          }
      ]
  )

  # Chat with automatic tool execution
  response = client.sessions.chat(
      session_id=session.id,
      messages=[
          {
              "role": "user",
              "content": "What's the weather like in Tokyo and Paris?"
          }
      ],
      auto_run_tools=True  # Tools execute automatically
  )

  # The response already includes the weather data
  print(response.choices[0].message.content)
  # Output: "In Tokyo, it's currently 22°C with clear skies. 
  #          In Paris, it's 15°C with light rain."
  ```

  ```javascript Node.js theme={"dark"}
  // Agent with a weather tool
  const agent = await client.agents.create({
      name: "Weather Assistant",
      tools: [
          {
              name: "get_weather",
              type: "integration",
              integration: {provider: "weather"}
          }
      ]
  });

  // Chat with automatic tool execution
  const response = await client.sessions.chat({
      session_id: session.id,
      messages: [
          {
              role: "user",
              content: "What's the weather like in Tokyo and Paris?"
          }
      ],
      auto_run_tools: true  // Tools execute automatically
  });

  // The response already includes the weather data
  console.log(response.choices[0].message.content);
  // Output: "In Tokyo, it's currently 22°C with clear skies. 
  //          In Paris, it's 15°C with light rain."
  ```
</CodeGroup>

### Tool History Management

The `recall_tools` parameter controls whether tool calls and their results are included when recalling conversation history:

* **`recall_tools=true` (default)**: Tool interactions are preserved in the conversation history
* **`recall_tools=false`**: Tool calls and results are excluded from recalled messages

This is useful when you want to maintain a cleaner conversation history without the technical details of tool executions.

## Finish Reasons

<CardGroup cols={2}>
  <Card title="stop" icon="stop">
    Natural stop point or provided stop sequence reached
  </Card>

  <Card title="length" icon="ruler">
    Maximum number of tokens specified in the request was reached
  </Card>

  <Card title="content_filter" icon="filter">
    Content was omitted due to a flag from content filters
  </Card>

  <Card title="tool_calls" icon="wrench">
    The model called a tool
  </Card>
</CardGroup>

## Support

If you need help with further questions in Julep:

* Join our [Discord community](https://discord.com/invite/JTSBGRZrzj)
* Check the [GitHub repository](https://github.com/julep-ai/julep)
* Contact support at [hey@julep.ai](mailto:hey@julep.ai)