Overview

Documents in Julep provide a way to store and retrieve information that can be used by agents. This section covers how to work with documents effectively.

Components

Documents in Julep consist of several key components that enable efficient storage and retrieval of information.

  • Title: The title component helps identify and organize documents.
  • Content: The textual content of the document.
  • Embeddings (automatically generated): The vector representations of text that enable semantic search capabilities. Generated using the text-embedding-3-large model from OpenAI.
  • Metadata: Metadata provides additional context and filtering capabilities for documents.

Docs Configuration Options

When creating a doc, the following attributes can be specified:

FieldTypeDescriptionDefault
titlestringThe title of the document.Required
contentstringThe content of the document.Required
metadataobjectAdditional metadata for the document, such as preferences or settings.null

How to Use Docs

Creating a Doc

Documents are attached to either an agent or a user. This is how you can create a doc using Julep’s SDKs.

Example:

To create a user doc, replace client.agents.docs.create with client.users.docs.create, and the agent_id argument with user_id.

Chunking

In Julep, documents are not automatically chunked. We recommend that developers handle chunking based on their specific use case requirements, as different applications may have unique needs for how documents should be divided.

For those who need assistance with chunking, we provide a utility function chunk_doc that can be used directly in task steps. For implementation details, you can check the source code for this method in this file.

Our full-text search functionality leverages Timescale’s powerful indexing to efficiently match user-provided keywords and phrases against document content, delivering relevant results even with variations in text. Key features include prefix and infix searching, morphology processing (stemming and lemmatization), fuzzy searching to handle typos, and exact result counts.

Parameters:

  • text (str): The textual query to search within documents.
  • metadata_filter (object, optional): Filters to apply based on document metadata.
  • lang (str, default 'en-US'): The language to use for full-text search processing.

Example:

Our embedding search functionality leverages machine learning to convert search queries and documents into numerical vectors, enabling semantic matching based on vector similarity. It utilizes an embedding space where similar vectors indicate related content and employs algorithms like k-nearest neighbors (KNN) to retrieve the most relevant documents. Key features include context awareness, flexibility for natural language queries, multi-modal search across various content types, and effective handling of synonyms.

Parameters:

  • vector (List[float]): The embedding vector representing the semantic meaning of the query.
  • alpha (float, default 0.7): The weight assigned to embedding-based results versus text-based results.
  • confidence (float, default 0.5): The confidence threshold for embedding similarity.
  • metadata_filter (object, optional): Filters to apply based on document metadata.
  • k (int, default 3): The number of top results to return.

Our hybrid search functionality combines the strengths of full-text search and embedding-based search to deliver highly relevant and accurate search results. By leveraging both keyword matching and semantic vector analysis, hybrid search ensures that queries are not only matched based on exact terms but also understood in context, providing more nuanced and precise results. This dual approach enhances search performance, improves result relevance, and accommodates a wider range of search queries.

Parameters:

  • text (str): The textual query to search within documents.
  • vector (List[float]): The embedding vector representing the semantic meaning of the query.
  • alpha (float, default 0.7): The weight assigned to embedding-based results versus text-based results.
  • confidence (float, default 0.5): The confidence threshold for embedding similarity.
  • metadata_filter (object, optional): Filters to apply based on document metadata.
  • k (int, default 3): The number of top results to return.
  • lang (str, default 'en-US'): The language to use for full-text search processing.

Example:

Filtering

While search is carried on based on the textual and/or semantic content of the documents, you can also filter the documents based on their metadata.

Example:

Relationship to Other Concepts

Sessions

Sessions have access to search, retrieve and reference agents and users documents inside chat conversations. Read more about it here.

Tasks

By leveraging System Tools, Julep Tasks have the ability to create, search, filter and read documents.

Example:

Checkout this cookbook that leverages Julep’s docs, system tools and tasks to create content-rich user personas.

Best Practices

Organize Metadata

  • 1. Metadata: Use consistent and descriptive metadata to enhance document retrieval and filtering.

Version Control

  • 1. Version Control: Maintain version control for documents to track changes and updates over time.

Security and Privacy

  • 1. Access Control: Ensure sensitive information is protected and access to documents is properly managed.

Efficient Chunking

  • 1. Chunking Strategies: Implement efficient chunking strategies to optimize document processing and retrieval.

Regular Updates

  • 1. Update: Regularly update document content and metadata to keep information current and relevant.

Next Steps

  • Sessions - Learn about sessions and how documents are used in chat conversations.
  • Tools - Learn about tools and how they can be used to fill documents with content.
  • Tasks - Learn about tasks and how to use documents inside tasks.
  • Cookbooks - Check out cookbooks to see how Julep can be used in real-world scenarios.