Comprehensive guide to AI models and parameters supported by Julep
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
claude-3-haiku | 200K tokens | 4K tokens | β | β | β | β | Budget |
claude-3-opus | 200K tokens | 4K tokens | β | β | β | β | Enterprise |
claude-3-sonnet | 200K tokens | 4K tokens | β | β | β | β | Premium |
claude-3.5-haiku | 200K tokens | 8K tokens | β | β | β | β | Standard |
claude-3.5-sonnet | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-3.5-sonnet-20240620 | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-3.5-sonnet-20241022 | 200K tokens | 8K tokens | β | β | β | β | Premium |
claude-3.7-sonnet | 200K tokens | 128K tokens | β | β | β | β | Premium |
claude-opus-4 | 200K tokens | 32K tokens | β | β | β | β | Enterprise |
claude-sonnet-4 | 200K tokens | 64K tokens | β | β | β | β | Premium |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
gemini-1.5-pro | 2M tokens | 8K tokens | β | β | β | β | Standard |
gemini-1.5-pro-latest | 1M tokens | 8K tokens | β | β | β | β | Premium |
gemini-2.0-flash | 1M tokens | 8K tokens | β | β | β | β | Budget |
gemini-2.5-flash | 1M tokens | 65K tokens | β | β | β | β | Budget |
gemini-2.5-pro | 1M tokens | 65K tokens | β | β | β | β | Standard |
gemini-2.5-pro-preview-03-25 | 1M tokens | 65K tokens | β | β | β | β | Standard |
gemini-2.5-pro-preview-06-05 | 1M tokens | 65K tokens | β | β | β | β | Standard |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
gpt-4-turbo | 128K tokens | 4K tokens | β | β | β | β | Enterprise |
gpt-4.1 | 1M tokens | 32K tokens | β | β | β | β | Premium |
gpt-4.1-mini | 1M tokens | 32K tokens | β | β | β | β | Budget |
gpt-4.1-nano | 1M tokens | 32K tokens | β | β | β | β | Budget |
gpt-4o | 128K tokens | 16K tokens | β | β | β | β | Premium |
gpt-4o-mini | 128K tokens | 16K tokens | β | β | β | β | Budget |
o1 | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
o1-mini | 128K tokens | 65K tokens | β | β | β | β | Standard |
o1-preview | 128K tokens | 32K tokens | β | β | β | β | Enterprise |
o3-mini | 200K tokens | 100K tokens | β | β | β | β | Standard |
o4-mini | 200K tokens | 100K tokens | β | β | β | β | Standard |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
deepseek-r1-distill-llama-70b | 128K tokens | 128K tokens | β | β | β | β | Standard |
gemma2-9b-it | 8K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.1-70b | 8K tokens | 8K tokens | β | β | β | β | Standard |
llama-3.1-8b | 128K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.1-8b-instant | 128K tokens | 8K tokens | β | β | β | β | Budget |
llama-3.3-70b-versatile | 128K tokens | 32K tokens | β | β | β | β | Standard |
meta-llama/Llama-Guard-4-12B | Unknown | Unknown | β | β | β | β | Unknown |
meta-llama/llama-4-maverick-17b-128e-instruct | 131K tokens | 8K tokens | β | β | β | β | Budget |
meta-llama/llama-4-scout-17b-16e-instruct | 131K tokens | 8K tokens | β | β | β | β | Budget |
mistral-saba-24b | 32K tokens | 32K tokens | β | β | β | β | Standard |
qwen-qwq-32b | 128K tokens | 128K tokens | β | β | β | β | Budget |
qwen/qwen3-32b | Unknown | Unknown | β | β | β | β | Unknown |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
deepseek-chat | 65K tokens | 8K tokens | β | β | β | β | Budget |
eva-llama-3.33-70b | Unknown | Unknown | β | β | β | β | Unknown |
eva-qwen-2.5-72b | Unknown | Unknown | β | β | β | β | Unknown |
hermes-3-llama-3.1-70b | Unknown | Unknown | β | β | β | β | Unknown |
l3.1-euryale-70b | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
l3.3-euryale-70b | 200K tokens | 100K tokens | β | β | β | β | Enterprise |
magnum-v4-72b | Unknown | Unknown | β | β | β | β | Unknown |
mistral-large-2411 | 128K tokens | 128K tokens | β | β | β | β | Premium |
openrouter/meta-llama/llama-4-maverick | 131K tokens | 8K tokens | β | β | β | β | Budget |
openrouter/meta-llama/llama-4-maverick:free | Unknown | Unknown | β | β | β | β | Unknown |
openrouter/meta-llama/llama-4-scout | 131K tokens | 8K tokens | β | β | β | β | Budget |
openrouter/meta-llama/llama-4-scout:free | Unknown | Unknown | β | β | β | β | Unknown |
qwen-2.5-72b-instruct | Unknown | Unknown | β | β | β | β | Unknown |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
cerebras/deepseek-r1-distill-llama-70b | 128K tokens | 128K tokens | β | β | β | β | Standard |
cerebras/llama-3.1-8b | 128K tokens | 128K tokens | β | β | β | β | Budget |
cerebras/llama-3.3-70b | 128K tokens | 128K tokens | β | β | β | β | Standard |
cerebras/llama-4-scout-17b-16e-instruct | 131K tokens | 8K tokens | β | β | β | β | Budget |
cerebras/qwen-3-32b | 128K tokens | 128K tokens | β | β | β | β | Budget |
Model Name | Context Window | Max Output | Tool Calling | Vision | Audio | Caching | Cost Tier |
---|---|---|---|---|---|---|---|
amazon/nova-lite-v1 | Unknown | Unknown | β | β | β | β | Unknown |
amazon/nova-micro-v1 | Unknown | Unknown | β | β | β | β | Unknown |
amazon/nova-pro-v1 | Unknown | Unknown | β | β | β | β | Unknown |
Model Name | Embedding Dimensions |
---|---|
Alibaba-NLP/gte-large-en-v1.5 | 1024 |
BAAI/bge-m3 | 1024 |
text-embedding-3-large | 1024 |
vertex_ai/text-embedding-004 | 1024 |
voyage-3 | 1024 |
voyage-multilingual-2 | 1024 |
Core Parameters
Parameter | Range | Description |
---|---|---|
temperature | 0.0 - 5.0 | Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic |
top_p | 0.0 - 1.0 | Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both |
max_tokens | β₯ 1 | Maximum number of tokens to generate in the response |
Penalty Parameters
Parameter | Range | Description |
---|---|---|
frequency_penalty | -2.0 - 2.0 | Penalizes tokens based on their frequency in the text. Positive values decrease repetition |
presence_penalty | -2.0 - 2.0 | Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content |
repetition_penalty | 0.0 - 2.0 | Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content |
length_penalty | 0.0 - 2.0 | Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations |
Advanced Controls
Parameter | Range | Description |
---|---|---|
min_p | 0.0 - 1.0 | Minimum probability threshold compared to the highest token probability |
seed | integer | For deterministic generation. Set a specific seed for reproducible results |
stop | list[str] | Up to 4 sequences where generation should stop |
response_format | object | Control output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
response_format
parameter is supported by OpenAI, Azure OpenAI, Google AI Studio (Gemini), Vertex AI, Bedrock, Anthropic, Groq, xAI (Grok-2+), Databricks, and Ollama. For the most up-to-date list, check the LiteLLM JSON Mode documentation.