Jay Taylor's notes
back to listing indexBerriAI/litellm: Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDI
[web search]Navigation Menu
LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier
Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here
Support for more providers. Missing a provider or LLM Platform, raise a feature request.
Usage (Docs)
LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here
LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.
pip install litellm
from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-openai-key" os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key" messages = [{ "content": "Hello, how are you?","role": "user"}] # openai call response = completion(model="openai/gpt-4o", messages=messages) # anthropic call response = completion(model="anthropic/claude-sonnet-4-20250514", messages=messages) print(response)
{
"id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
"created": 1751494488,
"model": "claude-sonnet-4-20250514",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"usage": {
"completion_tokens": 39,
"prompt_tokens": 13,
"total_tokens": 52,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
},
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information
Async (Docs)
from litellm import acompletion import asyncio async def test_get_response(): user_message = "Hello, how are you?" messages = [{"content": user_message, "role": "user"}] response = await acompletion(model="openai/gpt-4o", messages=messages) return response response = asyncio.run(test_get_response()) print(response)
Streaming (Docs)
LiteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion messages = [{"content": "Hello, how are you?", "role": "user"}] # gpt-4o response = completion(model="openai/gpt-4o", messages=messages, stream=True) for part in response: print(part.choices[0].delta.content or "") # claude sonnet 4 response = completion('anthropic/claude-sonnet-4-20250514', messages, stream=True) for part in response: print(part)
{
"id": "chatcmpl-fe575c37-5004-4926-ae5e-bfbc31f356ca",
"created": 1751494808,
"model": "claude-sonnet-4-20250514",
"object": "chat.completion.chunk",
"system_fingerprint": null,
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"provider_specific_fields": null,
"content": "Hello",
"role": "assistant",
"function_call": null,
"tool_calls": null,
"audio": null
},
"logprobs": null
}
],
"provider_specific_fields": null,
"stream_options": null,
"citations": null
}Logging Observability (Docs)
LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack
from litellm import completion ## set env variables for logging tools (when using MLflow, no API key set up is required) os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key" os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" os.environ["ATHINA_API_KEY"] = "your-athina-api-key" os.environ["OPENAI_API_KEY"] = "your-openai-key" # set callbacks litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc #openai call response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi - i'm openai"}])
LiteLLM Proxy Server (LLM Gateway) - (Docs)
Track spend + Load Balance across multiple projects
The proxy provides:
Proxy Endpoints - Swagger Docs
pip install 'litellm[proxy]'$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:4000import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response)
Proxy Key Management (Docs)
Connect the proxy with a Postgres DB to create proxy keys
# Get the code git clone https://github.com/BerriAI/litellm # Go to folder cd litellm # Add the master key - you can change this after setup echo 'LITELLM_MASTER_KEY="sk-1234"' > .env # Add the litellm salt key - you cannot change this after adding a model # It is used to encrypt / decrypt your LLM API Key credentials # We recommend - https://1password.com/password-generator/ # password generator to get a random hash for litellm salt key echo 'LITELLM_SALT_KEY="sk-1234"' >> .env source .env # Start docker compose up
UI on /ui on your proxy server
Set budgets and rate limits across multiple projects
POST /key/generate
curl 'http://0.0.0.0:4000/key/generate' --header 'Authorization: Bearer sk-1234' --header 'Content-Type: application/json' --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'
{
"key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
"expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
}Supported Providers (Website Supported Models | Docs)
- Setup .env file in root
- Run dependant services
docker-compose up db prometheus
- (In root) create virtual environment
python -m venv .venv - Activate virtual environment
source .venv/bin/activate - Install dependencies
pip install -e ".[all]" - Start proxy backend
python litellm/proxy_cli.py
- Navigate to
ui/litellm-dashboard - Install dependencies
npm install - Run
npm run devto start the dashboard
For companies that need better security, user management and professional support
This covers:
- ✅ Features under the LiteLLM Commercial License:
- ✅ Feature Prioritization
- ✅ Custom Integrations
- ✅ Professional Support - Dedicated discord + slack
- ✅ Custom SLAs
- ✅ Secure access with Single Sign-On
We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.
This requires poetry to be installed.
git clone https://github.com/BerriAI/litellm.git cd litellm make install-dev # Install development dependencies make format # Format your code make lint # Run all linting checks make test-unit # Run unit tests make format-check # Check formatting only
For detailed contributing guidelines, see CONTRIBUTING.md.
LiteLLM follows the Google Python Style Guide.
Our automated checks include:
- Black for code formatting
- Ruff for linting and code quality
- MyPy for type checking
- Circular import detection
- Import safety checks
All these checks must pass before your PR can be merged.
- Schedule Demo
- Community Discord
- Community Slack
- Our numbers +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
- Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.