Overview

Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.

With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.

Available Models

While we provide API keys for quick testing and development, you’ll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.

Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.

Anthropic

Here are the Anthropic models supported by Julep:

Model NameContext WindowBest For
claude-3-opus200K tokensComplex reasoning, analysis
claude-3-sonnet200K tokensGeneral purpose tasks
claude-3-haiku200K tokensQuick responses
claude-3.5-haiku200K tokensImproved reasoning
claude-3.5-sonnet200K tokensImproved reasoning
claude-3.5-sonnet-20240620200K tokensEnhanced reasoning capabilities
claude-3.5-sonnet-20241022200K tokensComputer Use Capabilities and one of the latest models
claude-3.7-sonnet200K tokensReasoning abilities and the latest model from Anthropic

Google

Here are the Google models supported by Julep:

Model NameContext WindowBest For
gemini-1.5-pro1M tokensComplex tasks
gemini-1.5-pro-latest1M tokensCutting-edge performance

OpenAI

Here are the OpenAI models supported by Julep:

Model NameContext WindowBest For
gpt-4-turbo128K tokensAdvanced reasoning
gpt-4o-mini128K tokensBalanced performance
gpt-4o128K tokensBalanced performance
o1-mini200K tokensQuick tasks
o1-preview200K tokensTesting features
o1200K tokensGeneral tasks
o3-mini200K tokensSuited for reasoning tasks

Groq

Here are the Groq models supported by Julep:

Model NameContext WindowBest For
llama-3.1-70b8K tokensLong-form content
llama-3.1-8b8K tokensQuick processing

OpenRouter

Here are the OpenRouter models supported by Julep:

Model NameContext WindowBest For
mistral-large-2411128K tokensHigh performance
qwen-2.5-72b-instruct131K tokensComplex instructions
eva-llama-3.33-70b128K tokensStory writing and creative fiction
l3.1-euryale-70b128K tokensPoetry and artistic writing
l3.3-euryale-70b128K tokensAdvanced creative writing and roleplay
magnum-v4-72b8K tokensContent generation and brainstorming
eva-qwen-2.5-72b8K tokensCreative problem solving and ideation
hermes-3-llama-3.1-70b8K tokensNarrative design and worldbuilding
deepseek-chat32K tokensConversational AI

Cerebras

Here are the Cerebras models supported by Julep:

Model NameContext WindowBest For
cerebras/llama-3.1-8b8K tokensQuick creative writing and basic text generation
cerebras/llama-3.3-70b8K tokensComplex creative writing, storytelling, and detailed content generation

Embedding

Here are the embedding models supported by Julep:

Model NameEmbedding DimensionsBest For
text-embedding-3-large1024High-quality vectors
voyage-multilingual-21024Cross-language tasks
voyage-31024Advanced embeddings
Alibaba-NLP/gte-large-en-v1.51024Cost-effective solutions
BAAI/bge-m31024Cost-effective solutions
vertex_ai/text-embedding-0041024Google Cloud integration

Though the models mention above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.

Supported Parameters

Following are a list of different paramters that can be used to control the behavior of the models.

ParameterRangeDescription
temperature0.0 - 5.0Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic
top_p0.0 - 1.0Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both
max_tokens≥ 1Maximum number of tokens to generate in the response
ParameterRangeDescription
frequency_penalty-2.0 - 2.0Penalizes tokens based on their frequency in the text. Positive values decrease repetition
presence_penalty-2.0 - 2.0Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content
repetition_penalty0.0 - 2.0Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content
length_penalty0.0 - 2.0Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations
ParameterRangeDescription
min_p0.0 - 1.0Minimum probability threshold compared to the highest token probability
seedintegerFor deterministic generation. Set a specific seed for reproducible results
stoplist[str]Up to 4 sequences where generation should stop

Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.

Best Practices:

  • Start with default values and adjust based on your needs
  • Use temperature (0.0 - 1.0) for most cases
  • Avoid setting multiple penalty parameters simultaneously
  • Test different combinations for optimal results

Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.

Usage Guidelines

Consider Model Selection Criteria

  • 1. Your budget and cost constraints
  • 2. How fast you need responses
  • 3. The quality you’re aiming for
  • 4. The context window size you require

Follow Best Practices

  • 1. Start with smaller models for development and testing
  • 2. Use larger context windows only when necessary
  • 3. Keep an eye on token usage to manage costs

For more information, please refer to the LiteLLM documentation.