Working with documents in Julep
text-embedding-3-large
model from OpenAI.Field | Type | Description | Default |
---|---|---|---|
title | string | The title of the document. | Required |
content | string | The content of the document. | Required |
metadata | object | Additional metadata for the document, such as preferences or settings. | null |
client.agents.docs.create
with client.users.docs.create
, and the agent_id
argument with user_id
.client.docs.get
method, and pass the doc’s ID.
Example:
title
, content
, embeddings
, metadata
and other attributes.
Sample Output
chunk_doc
that can be used directly in task steps. For implementation details, you can check the source code for this method in this file.
Parameter | Type | Description | Default |
---|---|---|---|
text | str | The textual query to search within documents. | Required |
metadata_filter | object | Filters to apply based on document metadata. | None |
lang | str | The language to use for full-text search processing. | 'english' |
limit | int | The maximum number of documents to return. | 10 |
trigram_similarity_threshold | float | The threshold for trigram similarity matching (higher values require closer matches) | 0.6 |
Sample Output
Parameter | Type | Description | Default |
---|---|---|---|
vector | number[] | The embedding vector representing the semantic meaning of the query. | Required |
limit | integer | The number of top results to return (must be between 1 and 50). | 10 |
lang | string | The language for the search query. | en-US |
metadata_filter | object | Filters to apply based on document metadata. | None |
mmr_strength | number | The strength of Maximum Marginal Relevance diversification (must be between 0 and 1). | 0.5 |
confidence | number | The confidence threshold for embedding similarity (must be between -1 and 1). | 0.5 |
Sample Output
Parameter | Type | Description | Default |
---|---|---|---|
text | str | The textual query to search within documents. | Required |
vector | List[float] | The embedding vector representing the semantic meaning of the query. | Required |
alpha | float | The weight assigned to embedding-based results versus text-based results (must be between 0 and 1). | 0.5 |
confidence | float | The confidence threshold for embedding similarity (must be between -1 and 1). | 0.5 |
metadata_filter | object | Filters to apply based on document metadata. | None |
limit | int | The number of top results to return. | 3 |
lang | str | The language to use for full-text search processing. | english_unaccent |
mmr_strength | float | The strength of Maximum Marginal Relevance diversification (must be between 0 and 1). | 0.5 |
trigram_similarity_threshold | float | The threshold for trigram similarity matching (must be between 0 and 1). | 0.6 |
k_multiplier | int | Controls how many intermediate results to fetch before final scoring. | 7 |
Sample Output
pg_trgm
extension and enhanced with additional similarity techniques. This allows for resilient document retrieval that can handle variations in text, including:
trigram_similarity_threshold
) - Controls the minimum similarity score required for a document to match:
alpha
) - Balances the importance of vector-based semantic search vs. text-based search:
lang
) - Affects tokenization, stemming, and other text processing operations: