Supported Models

Overview

Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.

With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.

Available Models

While we provide API keys for quick testing and development, you’ll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.

Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.

Anthropic

Here are the Anthropic models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
claude-3-haiku	200K tokens	4K tokens	✅	✅	❌	❌	Budget
claude-3-sonnet	200K tokens	4K tokens	✅	✅	❌	❌	Premium
claude-3.5-haiku	200K tokens	8K tokens	✅	❌	❌	✅	Standard
claude-3.5-sonnet	200K tokens	8K tokens	✅	✅	❌	✅	Premium
claude-3.5-sonnet-20240620	200K tokens	4K tokens	✅	✅	❌	❌	Premium
claude-3.5-sonnet-20241022	200K tokens	8K tokens	✅	✅	❌	✅	Premium
claude-3.7-sonnet	200K tokens	8K tokens	✅	✅	❌	✅	Premium
claude-haiku-4-5	Unknown	Unknown	❌	❌	❌	❌	Unknown
claude-opus-4	200K tokens	32K tokens	✅	✅	❌	✅	Enterprise
claude-opus-4-1	200K tokens	32K tokens	✅	✅	❌	✅	Enterprise
claude-sonnet-4	1M tokens	64K tokens	✅	✅	❌	✅	Premium
claude-sonnet-4-5	200K tokens	64K tokens	✅	✅	❌	✅	Premium

Google

Here are the Google models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
gemini-1.5-pro	2M tokens	8K tokens	✅	✅	❌	❌	Standard
gemini-1.5-pro-latest	1M tokens	8K tokens	✅	✅	❌	❌	Premium
gemini-2.0-flash	1M tokens	8K tokens	✅	✅	✅	✅	Budget
gemini-2.5-flash	1M tokens	65K tokens	✅	✅	❌	✅	Budget
gemini-2.5-pro	1M tokens	65K tokens	✅	✅	✅	✅	Standard
gemini-2.5-pro-preview-03-25	1M tokens	65K tokens	✅	✅	❌	✅	Standard
gemini-2.5-pro-preview-06-05	1M tokens	65K tokens	✅	✅	❌	✅	Standard

OpenAI

Here are the OpenAI models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
gpt-4-turbo	128K tokens	4K tokens	✅	✅	❌	✅	Enterprise
gpt-4.1	1M tokens	32K tokens	✅	✅	❌	✅	Premium
gpt-4.1-mini	1M tokens	32K tokens	✅	✅	❌	✅	Budget
gpt-4.1-nano	1M tokens	32K tokens	✅	✅	❌	✅	Budget
gpt-4o	128K tokens	16K tokens	✅	✅	❌	✅	Premium
gpt-4o-mini	128K tokens	16K tokens	✅	✅	❌	✅	Budget
gpt-5	272K tokens	128K tokens	✅	✅	❌	✅	Standard
gpt-5-2025-08-07	272K tokens	128K tokens	✅	✅	❌	✅	Standard
gpt-5-chat	272K tokens	128K tokens	❌	✅	❌	✅	Standard
gpt-5-chat-latest	128K tokens	16K tokens	❌	✅	❌	✅	Standard
gpt-5-mini	272K tokens	128K tokens	✅	✅	❌	✅	Budget
gpt-5-mini-2025-08-07	272K tokens	128K tokens	✅	✅	❌	✅	Budget
gpt-5-nano	272K tokens	128K tokens	✅	✅	❌	✅	Budget
gpt-5-nano-2025-08-07	272K tokens	128K tokens	✅	✅	❌	✅	Budget
o1	200K tokens	100K tokens	✅	✅	❌	✅	Enterprise
o1-mini	128K tokens	65K tokens	❌	✅	❌	✅	Standard
o1-preview	128K tokens	32K tokens	❌	✅	❌	✅	Enterprise
o3-mini	200K tokens	100K tokens	✅	❌	❌	✅	Standard
o4-mini	200K tokens	100K tokens	✅	✅	❌	✅	Standard

Groq

Here are the Groq models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
deepseek-r1-distill-llama-70b	128K tokens	128K tokens	✅	❌	❌	❌	Standard
gemma2-9b-it	8K tokens	8K tokens	❌	❌	❌	❌	Budget
llama-3.1-8b	128K tokens	8K tokens	✅	❌	❌	❌	Budget
llama-3.1-8b-instant	128K tokens	8K tokens	✅	❌	❌	❌	Budget
llama-3.3-70b-versatile	128K tokens	32K tokens	✅	❌	❌	❌	Standard
meta-llama/Llama-Guard-4-12B	163K tokens	163K tokens	❌	❌	❌	❌	Budget
meta-llama/llama-4-maverick-17b-128e-instruct	131K tokens	8K tokens	✅	❌	❌	❌	Budget
meta-llama/llama-4-scout-17b-16e-instruct	131K tokens	8K tokens	✅	❌	❌	❌	Budget
qwen/qwen3-32b	131K tokens	131K tokens	✅	❌	❌	❌	Budget

OpenRouter

Here are the OpenRouter models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
deepseek-chat	131K tokens	8K tokens	✅	❌	❌	✅	Standard
deepseek/deepseek-r1-distill-llama-70b	65K tokens	8K tokens	✅	❌	❌	✅	Standard
deepseek/deepseek-r1-distill-qwen-32b	65K tokens	8K tokens	✅	❌	❌	✅	Standard
eva-llama-3.33-70b	Unknown	Unknown	❌	❌	❌	❌	Unknown
eva-qwen-2.5-72b	Unknown	Unknown	❌	❌	❌	❌	Unknown
hermes-3-llama-3.1-70b	Unknown	Unknown	❌	❌	❌	❌	Unknown
l3.1-euryale-70b	200K tokens	100K tokens	✅	✅	❌	✅	Enterprise
l3.3-euryale-70b	200K tokens	100K tokens	✅	✅	❌	✅	Enterprise
magnum-v4-72b	Unknown	Unknown	❌	❌	❌	❌	Unknown
meta-llama/llama-3.1-8b-instruct	Unknown	Unknown	❌	❌	❌	❌	Unknown
meta-llama/llama-3.3-70b-instruct	Unknown	Unknown	❌	❌	❌	❌	Unknown
meta-llama/llama-4-scout	131K tokens	8K tokens	✅	❌	❌	❌	Budget
mistral-large-2411	128K tokens	128K tokens	✅	❌	❌	❌	Premium
openrouter/meta-llama/llama-4-maverick	131K tokens	8K tokens	✅	❌	❌	❌	Budget
openrouter/meta-llama/llama-4-maverick:free	Unknown	Unknown	❌	❌	❌	❌	Unknown
openrouter/meta-llama/llama-4-scout	131K tokens	8K tokens	✅	❌	❌	❌	Budget
openrouter/meta-llama/llama-4-scout:free	Unknown	Unknown	❌	❌	❌	❌	Unknown
perplexity/sonar	128K tokens	Unknown	❌	❌	❌	❌	Standard
perplexity/sonar-deep-research	128K tokens	Unknown	❌	❌	❌	❌	Premium
perplexity/sonar-pro	200K tokens	8K tokens	❌	❌	❌	❌	Premium
perplexity/sonar-reasoning	128K tokens	Unknown	❌	❌	❌	❌	Standard
perplexity/sonar-reasoning-pro	128K tokens	Unknown	❌	❌	❌	❌	Premium
qwen-2.5-72b-instruct	Unknown	Unknown	❌	❌	❌	❌	Unknown

Amazon Nova

Here are the Amazon Nova models supported by Julep:

Model Name	Context Window	Max Output	Tool Calling	Vision	Audio	Caching	Cost Tier
amazon/nova-lite-v1	Unknown	Unknown	❌	❌	❌	❌	Unknown
amazon/nova-micro-v1	Unknown	Unknown	❌	❌	❌	❌	Unknown
amazon/nova-pro-v1	Unknown	Unknown	❌	❌	❌	❌	Unknown

Embedding

Here are the embedding models supported by Julep:

Model Name	Embedding Dimensions
Alibaba-NLP/gte-large-en-v1.5	1024
BAAI/bge-m3	1024
text-embedding-3-large	1024
vertex_ai/text-embedding-004	1024
voyage-3	1024
voyage-multilingual-2	1024

Though the models mentioned above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.

Supported Parameters

Following are a list of different parameters that can be used to control the behavior of the models.

Core Parameters

Parameter	Range	Description
temperature	0.0 - 5.0	Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic
top_p	0.0 - 1.0	Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both
max_tokens	≥ 1	Maximum number of tokens to generate in the response

Penalty Parameters

Parameter	Range	Description
frequency_penalty	-2.0 - 2.0	Penalizes tokens based on their frequency in the text. Positive values decrease repetition
presence_penalty	-2.0 - 2.0	Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content
repetition_penalty	0.0 - 2.0	Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content
length_penalty	0.0 - 2.0	Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations

Advanced Controls

Parameter	Range	Description
min_p	0.0 - 1.0	Minimum probability threshold compared to the highest token probability
seed	integer	For deterministic generation. Set a specific seed for reproducible results
stop	list[str]	Up to 4 sequences where generation should stop
response_format	object	Control output format: `{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}`

Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.Response Format Support: The response_format parameter is supported by OpenAI, Azure OpenAI, Google AI Studio (Gemini), Vertex AI, Bedrock, Anthropic, Groq, xAI (Grok-2+), Databricks, and Ollama. For the most up-to-date list, check the LiteLLM JSON Mode documentation.

Best Practices:

Start with default values and adjust based on your needs
Use temperature (0.0 - 1.0) for most cases
Avoid setting multiple penalty parameters simultaneously
Test different combinations for optimal results

Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.

Usage Guidelines

Consider Model Selection Criteria

1. Your budget and cost constraints
2. How fast you need responses
3. The quality you’re aiming for
4. The context window size you require

Follow Best Practices

1. Start with smaller models for development and testing
2. Use larger context windows only when necessary
3. Keep an eye on token usage to manage costs

For more information, please refer to the LiteLLM documentation.

Communication & Data

Media & File Processing

Search

Web & Browser Automation

Extensibility

Supported Models

Overview

Available Models

Anthropic

Google

OpenAI

Groq

OpenRouter

Amazon Nova

Embedding

Supported Parameters

Usage Guidelines

Consider Model Selection Criteria

Follow Best Practices

Communication & Data

Media & File Processing

Search

Web & Browser Automation

Extensibility

​Overview

​Available Models

​Anthropic

​Google

​OpenAI

​Groq

​OpenRouter

​Amazon Nova

​Embedding

​Supported Parameters

​Usage Guidelines

Consider Model Selection Criteria

Follow Best Practices

Overview

Available Models

Anthropic

Google

OpenAI

Groq

OpenRouter

Amazon Nova

Embedding

Supported Parameters

Usage Guidelines