Overview

Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.
With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.

Available Models

While we provide API keys for quick testing and development, you’ll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.
Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.

Anthropic

Here are the Anthropic models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
claude-3-haiku200K tokens4K tokensβœ…βœ…βŒβœ…Budget
claude-3-opus200K tokens4K tokensβœ…βœ…βŒβœ…Enterprise
claude-3-sonnet200K tokens4K tokensβœ…βœ…βŒβœ…Premium
claude-3.5-haiku200K tokens8K tokensβœ…βœ…βŒβœ…Standard
claude-3.5-sonnet200K tokens8K tokensβœ…βœ…βŒβœ…Premium
claude-3.5-sonnet-20240620200K tokens8K tokensβœ…βœ…βŒβœ…Premium
claude-3.5-sonnet-20241022200K tokens8K tokensβœ…βœ…βŒβœ…Premium
claude-3.7-sonnet200K tokens128K tokensβœ…βœ…βŒβœ…Premium
claude-opus-4200K tokens32K tokensβœ…βœ…βŒβœ…Enterprise
claude-sonnet-4200K tokens64K tokensβœ…βœ…βŒβœ…Premium

Google

Here are the Google models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
gemini-1.5-pro2M tokens8K tokensβœ…βœ…βŒβŒStandard
gemini-1.5-pro-latest1M tokens8K tokensβœ…βœ…βŒβŒPremium
gemini-2.0-flash1M tokens8K tokensβœ…βœ…βœ…βŒBudget
gemini-2.5-flash1M tokens65K tokensβœ…βœ…βŒβŒBudget
gemini-2.5-pro1M tokens65K tokensβœ…βœ…βœ…βŒStandard
gemini-2.5-pro-preview-03-251M tokens65K tokensβœ…βœ…βŒβŒStandard
gemini-2.5-pro-preview-06-051M tokens65K tokensβœ…βœ…βŒβŒStandard

OpenAI

Here are the OpenAI models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
gpt-4-turbo128K tokens4K tokensβœ…βœ…βŒβœ…Enterprise
gpt-4.11M tokens32K tokensβœ…βœ…βŒβœ…Premium
gpt-4.1-mini1M tokens32K tokensβœ…βœ…βŒβœ…Budget
gpt-4.1-nano1M tokens32K tokensβœ…βœ…βŒβœ…Budget
gpt-4o128K tokens16K tokensβœ…βœ…βŒβœ…Premium
gpt-4o-mini128K tokens16K tokensβœ…βœ…βŒβœ…Budget
o1200K tokens100K tokensβœ…βœ…βŒβœ…Enterprise
o1-mini128K tokens65K tokensβŒβœ…βŒβœ…Standard
o1-preview128K tokens32K tokensβŒβœ…βŒβœ…Enterprise
o3-mini200K tokens100K tokensβœ…βŒβŒβœ…Standard
o4-mini200K tokens100K tokensβœ…βœ…βŒβœ…Standard

Groq

Here are the Groq models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
deepseek-r1-distill-llama-70b128K tokens128K tokensβœ…βŒβŒβŒStandard
gemma2-9b-it8K tokens8K tokens❌❌❌❌Budget
llama-3.1-70b8K tokens8K tokensβœ…βŒβŒβŒStandard
llama-3.1-8b128K tokens8K tokensβœ…βŒβŒβŒBudget
llama-3.1-8b-instant128K tokens8K tokensβœ…βŒβŒβŒBudget
llama-3.3-70b-versatile128K tokens32K tokensβœ…βŒβŒβŒStandard
meta-llama/Llama-Guard-4-12BUnknownUnknown❌❌❌❌Unknown
meta-llama/llama-4-maverick-17b-128e-instruct131K tokens8K tokensβœ…βŒβŒβŒBudget
meta-llama/llama-4-scout-17b-16e-instruct131K tokens8K tokensβœ…βŒβŒβŒBudget
mistral-saba-24b32K tokens32K tokens❌❌❌❌Standard
qwen-qwq-32b128K tokens128K tokensβœ…βŒβŒβŒBudget
qwen/qwen3-32bUnknownUnknown❌❌❌❌Unknown

OpenRouter

Here are the OpenRouter models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
deepseek-chat65K tokens8K tokensβŒβŒβŒβœ…Budget
eva-llama-3.33-70bUnknownUnknown❌❌❌❌Unknown
eva-qwen-2.5-72bUnknownUnknown❌❌❌❌Unknown
hermes-3-llama-3.1-70bUnknownUnknown❌❌❌❌Unknown
l3.1-euryale-70b200K tokens100K tokensβœ…βœ…βŒβœ…Enterprise
l3.3-euryale-70b200K tokens100K tokensβœ…βœ…βŒβœ…Enterprise
magnum-v4-72bUnknownUnknown❌❌❌❌Unknown
mistral-large-2411128K tokens128K tokensβœ…βŒβŒβŒPremium
openrouter/meta-llama/llama-4-maverick131K tokens8K tokensβœ…βŒβŒβŒBudget
openrouter/meta-llama/llama-4-maverick:freeUnknownUnknown❌❌❌❌Unknown
openrouter/meta-llama/llama-4-scout131K tokens8K tokensβœ…βŒβŒβŒBudget
openrouter/meta-llama/llama-4-scout:freeUnknownUnknown❌❌❌❌Unknown
qwen-2.5-72b-instructUnknownUnknown❌❌❌❌Unknown

Cerebras

Here are the Cerebras models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
cerebras/deepseek-r1-distill-llama-70b128K tokens128K tokensβœ…βŒβŒβŒStandard
cerebras/llama-3.1-8b128K tokens128K tokensβœ…βŒβŒβŒBudget
cerebras/llama-3.3-70b128K tokens128K tokensβœ…βŒβŒβŒStandard
cerebras/llama-4-scout-17b-16e-instruct131K tokens8K tokensβœ…βŒβŒβŒBudget
cerebras/qwen-3-32b128K tokens128K tokensβœ…βŒβŒβŒBudget

Amazon Nova

Here are the Amazon Nova models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
amazon/nova-lite-v1UnknownUnknown❌❌❌❌Unknown
amazon/nova-micro-v1UnknownUnknown❌❌❌❌Unknown
amazon/nova-pro-v1UnknownUnknown❌❌❌❌Unknown

Embedding

Here are the embedding models supported by Julep:
Model NameEmbedding Dimensions
Alibaba-NLP/gte-large-en-v1.51024
BAAI/bge-m31024
text-embedding-3-large1024
vertex_ai/text-embedding-0041024
voyage-31024
voyage-multilingual-21024
Though the models mentioned above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.

Supported Parameters

Following are a list of different parameters that can be used to control the behavior of the models.

Core Parameters

Penalty Parameters

Advanced Controls

Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.Response Format Support: The response_format parameter is supported by OpenAI, Azure OpenAI, Google AI Studio (Gemini), Vertex AI, Bedrock, Anthropic, Groq, xAI (Grok-2+), Databricks, and Ollama. For the most up-to-date list, check the LiteLLM JSON Mode documentation.
Best Practices:
  • Start with default values and adjust based on your needs
  • Use temperature (0.0 - 1.0) for most cases
  • Avoid setting multiple penalty parameters simultaneously
  • Test different combinations for optimal results
Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.

Usage Guidelines

Consider Model Selection Criteria

  • 1. Your budget and cost constraints
  • 2. How fast you need responses
  • 3. The quality you’re aiming for
  • 4. The context window size you require

Follow Best Practices

  • 1. Start with smaller models for development and testing
  • 2. Use larger context windows only when necessary
  • 3. Keep an eye on token usage to manage costs
For more information, please refer to the LiteLLM documentation.