Overview
Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.
Available Models
While we provide API keys for quick testing and development, youβll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.
Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.
Anthropic
Here are the Anthropic models supported by Julep:Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
claude-3-haiku | 200K tokens | 4K tokens | β | β | β | β | Budget |
claude-3-sonnet | 200K tokens | 4K tokens | β | β | β | β | Premium |
claude-3.5-haiku | 200K tokens | 8K tokens | β | β | β | β | Standard |
claude-3.5-sonnet | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-3.5-sonnet-20240620 | 200K tokens | 4K tokens | β | β | β | β | Premium |
claude-3.5-sonnet-20241022 | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-3.7-sonnet | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-opus-4 | 200K tokens | 32K tokens | β | β | β | β | Enterprise |
claude-opus-4-1 | 200K tokens | 32K tokens | β | β | β | β | Enterprise |
claude-sonnet-4 | 200K tokens | 64K tokens | β | β | β | β | Premium |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
gemini-1.5-pro | 2M tokens | 8K tokens | β | β | β | β | Standard |
gemini-1.5-pro-latest | 1M tokens | 8K tokens | β | β | β | β | Premium |
gemini-2.0-flash | 1M tokens | 8K tokens | β | β | β | β | Budget |
gemini-2.5-flash | 1M tokens | 65K tokens | β | β | β | β | Budget |
gemini-2.5-pro | 1M tokens | 65K tokens | β | β | β | β | Standard |
gemini-2.5-pro-preview-03-25 | 1M tokens | 65K tokens | β | β | β | β | Standard |
gemini-2.5-pro-preview-06-05 | 1M tokens | 65K tokens | β | β | β | β | Standard |
OpenAI
Here are the OpenAI models supported by Julep:Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
gpt-4-turbo | 128K tokens | 4K tokens | β | β | β | β | Enterprise |
gpt-4.1 | 1M tokens | 32K tokens | β | β | β | β | Premium |
gpt-4.1-mini | 1M tokens | 32K tokens | β | β | β | β | Budget |
gpt-4.1-nano | 1M tokens | 32K tokens | β | β | β | β | Budget |
gpt-4o | 128K tokens | 16K tokens | β | β | β | β | Premium |
gpt-4o-mini | 128K tokens | 16K tokens | β | β | β | β | Budget |
gpt-5 | 400K tokens | 128K tokens | β | β | β | β | Standard |
gpt-5-2025-08-07 | 400K tokens | 128K tokens | β | β | β | β | Standard |
gpt-5-chat | 400K tokens | 128K tokens | β | β | β | β | Standard |
gpt-5-chat-latest | 400K tokens | 128K tokens | β | β | β | β | Standard |
gpt-5-mini | 400K tokens | 128K tokens | β | β | β | β | Budget |
gpt-5-mini-2025-08-07 | 400K tokens | 128K tokens | β | β | β | β | Budget |
gpt-5-nano | 400K tokens | 128K tokens | β | β | β | β | Budget |
gpt-5-nano-2025-08-07 | 400K tokens | 128K tokens | β | β | β | β | Budget |
o1 | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
o1-mini | 128K tokens | 65K tokens | β | β | β | β | Standard |
o1-preview | 128K tokens | 32K tokens | β | β | β | β | Enterprise |
o3-mini | 200K tokens | 100K tokens | β | β | β | β | Standard |
o4-mini | 200K tokens | 100K tokens | β | β | β | β | Standard |
Groq
Here are the Groq models supported by Julep:Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
deepseek-r1-distill-llama-70b | 128K tokens | 128K tokens | β | β | β | β | Standard |
gemma2-9b-it | 8K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.1-8b | 128K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.1-8b-instant | 128K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.3-70b-versatile | 128K tokens | 32K tokens | β | β | β | β | Standard |
meta-llama/Llama-Guard-4-12B | 163K tokens | 163K tokens | β | β | β | β | Budget |
meta-llama/llama-4-maverick-17b-128e-instruct | 131K tokens | 8K tokens | β | β | β | β | Budget |
meta-llama/llama-4-scout-17b-16e-instruct | 131K tokens | 8K tokens | β | β | β | β | Budget |
qwen/qwen3-32b | 131K tokens | 131K tokens | β | β | β | β | Budget |
OpenRouter
Here are the OpenRouter models supported by Julep:Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
deepseek-chat | 65K tokens | 8K tokens | β | β | β | β | Budget |
deepseek/deepseek-r1-distill-llama-70b | 65K tokens | 8K tokens | β | β | β | β | Standard |
deepseek/deepseek-r1-distill-qwen-32b | 65K tokens | 8K tokens | β | β | β | β | Standard |
eva-llama-3.33-70b | Unknown | Unknown | β | β | β | β | Unknown |
eva-qwen-2.5-72b | Unknown | Unknown | β | β | β | β | Unknown |
hermes-3-llama-3.1-70b | Unknown | Unknown | β | β | β | β | Unknown |
l3.1-euryale-70b | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
l3.3-euryale-70b | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
magnum-v4-72b | Unknown | Unknown | β | β | β | β | Unknown |
meta-llama/llama-3.1-8b-instruct | Unknown | Unknown | β | β | β | β | Unknown |
meta-llama/llama-3.3-70b-instruct | Unknown | Unknown | β | β | β | β | Unknown |
meta-llama/llama-4-scout | 131K tokens | 8K tokens | β | β | β | β | Budget |
mistral-large-2411 | 128K tokens | 128K tokens | β | β | β | β | Premium |
openrouter/meta-llama/llama-4-maverick | 131K tokens | 8K tokens | β | β | β | β | Budget |
openrouter/meta-llama/llama-4-maverick:free | Unknown | Unknown | β | β | β | β | Unknown |
openrouter/meta-llama/llama-4-scout | 131K tokens | 8K tokens | β | β | β | β | Budget |
openrouter/meta-llama/llama-4-scout:free | Unknown | Unknown | β | β | β | β | Unknown |
perplexity/sonar | 128K tokens | Unknown | β | β | β | β | Standard |
perplexity/sonar-deep-research | 128K tokens | Unknown | β | β | β | β | Premium |
perplexity/sonar-pro | 200K tokens | 8K tokens | β | β | β | β | Premium |
perplexity/sonar-reasoning | 128K tokens | Unknown | β | β | β | β | Standard |
perplexity/sonar-reasoning-pro | 128K tokens | Unknown | β | β | β | β | Premium |
qwen-2.5-72b-instruct | Unknown | Unknown | β | β | β | β | Unknown |
Amazon Nova
Here are the Amazon Nova models supported by Julep:Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
amazon/nova-lite-v1 | Unknown | Unknown | β | β | β | β | Unknown |
amazon/nova-micro-v1 | Unknown | Unknown | β | β | β | β | Unknown |
amazon/nova-pro-v1 | Unknown | Unknown | β | β | β | β | Unknown |
Embedding
Here are the embedding models supported by Julep:Model Name | Embedding Dimensions |
---|---|
Alibaba-NLP/gte-large-en-v1.5 | 1024 |
BAAI/bge-m3 | 1024 |
text-embedding-3-large | 1024 |
vertex_ai/text-embedding-004 | 1024 |
voyage-3 | 1024 |
voyage-multilingual-2 | 1024 |
Though the models mentioned above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.
Supported Parameters
Following are a list of different parameters that can be used to control the behavior of the models.Core Parameters
Core Parameters
Parameter | Range | Description |
---|---|---|
temperature | 0.0 - 5.0 | Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic |
top_p | 0.0 - 1.0 | Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both |
max_tokens | β₯ 1 | Maximum number of tokens to generate in the response |
Penalty Parameters
Penalty Parameters
Parameter | Range | Description |
---|---|---|
frequency_penalty | -2.0 - 2.0 | Penalizes tokens based on their frequency in the text. Positive values decrease repetition |
presence_penalty | -2.0 - 2.0 | Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content |
repetition_penalty | 0.0 - 2.0 | Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content |
length_penalty | 0.0 - 2.0 | Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations |
Advanced Controls
Advanced Controls
Parameter | Range | Description |
---|---|---|
min_p | 0.0 - 1.0 | Minimum probability threshold compared to the highest token probability |
seed | integer | For deterministic generation. Set a specific seed for reproducible results |
stop | list[str] | Up to 4 sequences where generation should stop |
response_format | object | Control output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.Response Format Support: The
response_format
parameter is supported by OpenAI, Azure OpenAI, Google AI Studio (Gemini), Vertex AI, Bedrock, Anthropic, Groq, xAI (Grok-2+), Databricks, and Ollama. For the most up-to-date list, check the LiteLLM JSON Mode documentation.Best Practices:
- Start with default values and adjust based on your needs
- Use temperature (0.0 - 1.0) for most cases
- Avoid setting multiple penalty parameters simultaneously
- Test different combinations for optimal results
Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.
Usage Guidelines
Consider Model Selection Criteria
- 1. Your budget and cost constraints
- 2. How fast you need responses
- 3. The quality youβre aiming for
- 4. The context window size you require
Follow Best Practices
- 1. Start with smaller models for development and testing
- 2. Use larger context windows only when necessary
- 3. Keep an eye on token usage to manage costs
For more information, please refer to the LiteLLM documentation.